May 7, 2019

2799 words 14 mins read

Paper Group AWR 67

Paper Group AWR 67

Agnostic Estimation of Mean and Covariance. PlaNet - Photo Geolocation with Convolutional Neural Networks. DropNeuron: Simplifying the Structure of Deep Neural Networks. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. Lexical Query Modeling in Session Search. Top-down Visual Saliency Guided by Captions. Detect, Replace, …

Agnostic Estimation of Mean and Covariance

Title Agnostic Estimation of Mean and Covariance
Authors Kevin A. Lai, Anup B. Rao, Santosh Vempala
Abstract We consider the problem of estimating the mean and covariance of a distribution from iid samples in $\mathbb{R}^n$, in the presence of an $\eta$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. The agnostic problem includes many interesting special cases, e.g., learning the parameters of a single Gaussian (or finding the best-fit Gaussian) when $\eta$ fraction of data is adversarially corrupted, agnostically learning a mixture of Gaussians, agnostic ICA, etc. We present polynomial-time algorithms to estimate the mean and covariance with error guarantees in terms of information-theoretic lower bounds. As a corollary, we also obtain an agnostic algorithm for Singular Value Decomposition.
Tasks
Published 2016-04-24
URL http://arxiv.org/abs/1604.06968v2
PDF http://arxiv.org/pdf/1604.06968v2.pdf
PWC https://paperswithcode.com/paper/agnostic-estimation-of-mean-and-covariance
Repo https://github.com/kal2000/AgnosticMeanAndCovarianceCode
Framework none

PlaNet - Photo Geolocation with Convolutional Neural Networks

Title PlaNet - Photo Geolocation with Convolutional Neural Networks
Authors Tobias Weyand, Ilya Kostrikov, James Philbin
Abstract Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model.
Tasks Image Retrieval
Published 2016-02-17
URL http://arxiv.org/abs/1602.05314v1
PDF http://arxiv.org/pdf/1602.05314v1.pdf
PWC https://paperswithcode.com/paper/planet-photo-geolocation-with-convolutional
Repo https://github.com/gjacopo/poppysite
Framework none

DropNeuron: Simplifying the Structure of Deep Neural Networks

Title DropNeuron: Simplifying the Structure of Deep Neural Networks
Authors Wei Pan, Hao Dong, Yike Guo
Abstract Deep learning using multi-layer neural networks (NNs) architecture manifests superb power in modern machine learning systems. The trained Deep Neural Networks (DNNs) are typically large. The question we would like to address is whether it is possible to simplify the NN during training process to achieve a reasonable performance within an acceptable computational time. We presented a novel approach of optimising a deep neural network through regularisation of net- work architecture. We proposed regularisers which support a simple mechanism of dropping neurons during a network training process. The method supports the construction of a simpler deep neural networks with compatible performance with its simplified version. As a proof of concept, we evaluate the proposed method with examples including sparse linear regression, deep autoencoder and convolutional neural network. The valuations demonstrate excellent performance. The code for this work can be found in http://www.github.com/panweihit/DropNeuron
Tasks
Published 2016-06-23
URL http://arxiv.org/abs/1606.07326v3
PDF http://arxiv.org/pdf/1606.07326v3.pdf
PWC https://paperswithcode.com/paper/dropneuron-simplifying-the-structure-of-deep
Repo https://github.com/panweihit/DropNeuron
Framework tf

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

Title End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Authors Makoto Miwa, Mohit Bansal
Abstract We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a single model. We further encourage detection of entities during training and use of entity information in relation extraction via entity pretraining and scheduled sampling. Our model improves over the state-of-the-art feature-based model on end-to-end relation extraction, achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and ACE2004, respectively. We also show that our LSTM-RNN based model compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Finally, we present an extensive ablation analysis of several model components.
Tasks Relation Classification, Relation Extraction
Published 2016-01-05
URL http://arxiv.org/abs/1601.00770v3
PDF http://arxiv.org/pdf/1601.00770v3.pdf
PWC https://paperswithcode.com/paper/end-to-end-relation-extraction-using-lstms-on
Repo https://github.com/tticoin/LSTM-ER
Framework none
Title Lexical Query Modeling in Session Search
Authors Christophe Van Gysel, Evangelos Kanoulas, Maarten de Rijke
Abstract Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighing perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search.
Tasks
Published 2016-08-23
URL http://arxiv.org/abs/1608.06656v1
PDF http://arxiv.org/pdf/1608.06656v1.pdf
PWC https://paperswithcode.com/paper/lexical-query-modeling-in-session-search
Repo https://github.com/cvangysel/sesh
Framework none

Top-down Visual Saliency Guided by Captions

Title Top-down Visual Saliency Guided by Captions
Authors Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko
Abstract Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult to explain. Top-down neural saliency methods can find important regions given a high-level semantic task such as object classification, but cannot use a natural language sentence as the top-down input for the task. In this paper, we propose Caption-Guided Visual Saliency to expose the region-to-word mapping in modern encoder-decoder networks and demonstrate that it is learned implicitly from caption training data, without any pixel-level annotations. Our approach can produce spatial or spatiotemporal heatmaps for both predicted captions, and for arbitrary query sentences. It recovers saliency without the overhead of introducing explicit attention layers, and can be used to analyze a variety of existing model architectures and improve their design. Evaluation on large-scale video and image datasets demonstrates that our approach achieves comparable captioning performance with existing methods while providing more accurate saliency heatmaps. Our code is available at visionlearninggroup.github.io/caption-guided-saliency/.
Tasks Object Classification, Video Captioning
Published 2016-12-21
URL http://arxiv.org/abs/1612.07360v2
PDF http://arxiv.org/pdf/1612.07360v2.pdf
PWC https://paperswithcode.com/paper/top-down-visual-saliency-guided-by-captions
Repo https://github.com/indigo-dc/seeds-classification-theano
Framework none

Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling

Title Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling
Authors Spyros Gidaris, Nikos Komodakis
Abstract Pixel wise image labeling is an interesting and challenging problem with great significance in the computer vision community. In order for a dense labeling algorithm to be able to achieve accurate and precise results, it has to consider the dependencies that exist in the joint space of both the input and the output variables. An implicit approach for modeling those dependencies is by training a deep neural network that, given as input an initial estimate of the output labels and the input image, it will be able to predict a new refined estimate for the labels. In this context, our work is concerned with what is the optimal architecture for performing the label improvement task. We argue that the prior approaches of either directly predicting new label estimates or predicting residual corrections w.r.t. the initial labels with feed-forward deep network architectures are sub-optimal. Instead, we propose a generic architecture that decomposes the label improvement task to three steps: 1) detecting the initial label estimates that are incorrect, 2) replacing the incorrect labels with new ones, and finally 3) refining the renewed labels by predicting residual corrections w.r.t. them. Furthermore, we explore and compare various other alternative architectures that consist of the aforementioned Detection, Replace, and Refine components. We extensively evaluate the examined architectures in the challenging task of dense disparity estimation (stereo matching) and we report both quantitative and qualitative results on three different datasets. Finally, our dense disparity estimation network that implements the proposed generic architecture, achieves state-of-the-art results in the KITTI 2015 test surpassing prior approaches by a significant margin.
Tasks Disparity Estimation, Stereo Matching, Stereo Matching Hand, Structured Prediction
Published 2016-12-14
URL http://arxiv.org/abs/1612.04770v1
PDF http://arxiv.org/pdf/1612.04770v1.pdf
PWC https://paperswithcode.com/paper/detect-replace-refine-deep-structured
Repo https://github.com/gidariss/DRR_struct_pred
Framework none

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings

Title Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings
Authors Ondřej Dušek, Filip Jurčíček
Abstract We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach. We were able to train both setups successfully using very little training data. The joint setup offers better performance, surpassing state-of-the-art with regards to n-gram-based scores while providing more relevant outputs.
Tasks
Published 2016-06-17
URL http://arxiv.org/abs/1606.05491v1
PDF http://arxiv.org/pdf/1606.05491v1.pdf
PWC https://paperswithcode.com/paper/sequence-to-sequence-generation-for-spoken
Repo https://github.com/UFAL-DSG/tgen
Framework tf

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents

Title SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Authors Ramesh Nallapati, Feifei Zhai, Bowen Zhou
Abstract We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.
Tasks Document Summarization
Published 2016-11-14
URL http://arxiv.org/abs/1611.04230v1
PDF http://arxiv.org/pdf/1611.04230v1.pdf
PWC https://paperswithcode.com/paper/summarunner-a-recurrent-neural-network-based
Repo https://github.com/amagooda/SummaRuNNer_coattention
Framework pytorch

C-RNN-GAN: Continuous recurrent neural networks with adversarial training

Title C-RNN-GAN: Continuous recurrent neural networks with adversarial training
Authors Olof Mogren
Abstract Generative adversarial networks have been proposed as a way of efficiently training deep generative neural networks. We propose a generative adversarial model that works on continuous sequential data, and apply it by training it on a collection of classical music. We conclude that it generates music that sounds better and better as the model is trained, report statistics on generated music, and let the reader judge the quality by downloading the generated songs.
Tasks
Published 2016-11-29
URL http://arxiv.org/abs/1611.09904v1
PDF http://arxiv.org/pdf/1611.09904v1.pdf
PWC https://paperswithcode.com/paper/c-rnn-gan-continuous-recurrent-neural
Repo https://github.com/lblakely/ganProject
Framework none

ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization

Title ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization
Authors Vadim Kantorov, Maxime Oquab, Minsu Cho, Ivan Laptev
Abstract We aim to localize objects in images using image-level supervision only. Previous approaches to this problem mainly focus on discriminative object regions and often fail to locate precise object boundaries. We address this problem by introducing two types of context-aware guidance models, additive and contrastive models, that leverage their surrounding context regions to improve localization. The additive model encourages the predicted object region to be supported by its surrounding context region. The contrastive model encourages the predicted object region to be outstanding from its surrounding context region. Our approach benefits from the recent success of convolutional neural networks for object recognition and extends Fast R-CNN to weakly supervised object localization. Extensive experimental evaluation on the PASCAL VOC 2007 and 2012 benchmarks shows hat our context-aware approach significantly improves weakly supervised localization and detection.
Tasks Object Localization, Object Recognition, Weakly Supervised Object Detection, Weakly-Supervised Object Localization
Published 2016-09-14
URL http://arxiv.org/abs/1609.04331v1
PDF http://arxiv.org/pdf/1609.04331v1.pdf
PWC https://paperswithcode.com/paper/contextlocnet-context-aware-deep-network
Repo https://github.com/vadimkantorov/contextlocnet
Framework torch

Very Deep Convolutional Networks for Text Classification

Title Very Deep Convolutional Networks for Text Classification
Authors Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun
Abstract The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. We are able to show that the performance of this model increases with depth: using up to 29 convolutional layers, we report improvements over the state-of-the-art on several public text classification tasks. To the best of our knowledge, this is the first time that very deep convolutional nets have been applied to text processing.
Tasks Text Classification
Published 2016-06-06
URL http://arxiv.org/abs/1606.01781v2
PDF http://arxiv.org/pdf/1606.01781v2.pdf
PWC https://paperswithcode.com/paper/very-deep-convolutional-networks-for-text
Repo https://github.com/yeahshow/word2vec_medical_record
Framework none

AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification

Title AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification
Authors Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang
Abstract Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in remote sensing area and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing datasets for aerial scene classification like UC-Merced dataset and WHU-RS19 are with relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image Dataset (AID): a large-scale dataset for aerial scene classification. The goal of AID is to advance the state-of-the-arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than ten thousands aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely-used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.
Tasks Scene Classification
Published 2016-08-18
URL http://arxiv.org/abs/1608.05167v1
PDF http://arxiv.org/pdf/1608.05167v1.pdf
PWC https://paperswithcode.com/paper/aid-a-benchmark-dataset-for-performance
Repo https://github.com/MLEnthusiast/MHCLN
Framework tf

A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI

Title A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI
Authors Phi Vu Tran
Abstract Automated cardiac segmentation from magnetic resonance imaging datasets is an essential step in the timely diagnosis and management of cardiac pathologies. We propose to tackle the problem of automated left and right ventricle segmentation through the application of a deep fully convolutional neural network architecture. Our model is efficiently trained end-to-end in a single learning stage from whole-image inputs and ground truths to make inference at every pixel. To our knowledge, this is the first application of a fully convolutional neural network architecture for pixel-wise labeling in cardiac magnetic resonance imaging. Numerical experiments demonstrate that our model is robust to outperform previous fully automated methods across multiple evaluation measures on a range of cardiac datasets. Moreover, our model is fast and can leverage commodity compute resources such as the graphics processing unit to enable state-of-the-art cardiac segmentation at massive scales. The models and code are available at https://github.com/vuptran/cardiac-segmentation
Tasks Cardiac Segmentation
Published 2016-04-02
URL http://arxiv.org/abs/1604.00494v3
PDF http://arxiv.org/pdf/1604.00494v3.pdf
PWC https://paperswithcode.com/paper/a-fully-convolutional-neural-network-for-1
Repo https://github.com/modelhub-ai/cardiac-fcn
Framework none

A Fast Unified Model for Parsing and Sentence Understanding

Title A Fast Unified Model for Parsing and Sentence Understanding
Authors Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, Christopher Potts
Abstract Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Interpreter Neural Network (SPINN), which combines parsing and interpretation within a single tree-sequence hybrid model by integrating tree-structured sentence interpretation into the linear sequential structure of a shift-reduce parser. Our model supports batched computation for a speedup of up to 25 times over other tree-structured models, and its integrated parser can operate on unparsed data with little loss in accuracy. We evaluate it on the Stanford NLI entailment task and show that it significantly outperforms other sentence-encoding models.
Tasks
Published 2016-03-19
URL http://arxiv.org/abs/1603.06021v3
PDF http://arxiv.org/pdf/1603.06021v3.pdf
PWC https://paperswithcode.com/paper/a-fast-unified-model-for-parsing-and-sentence
Repo https://github.com/NYU-MLL/spinn
Framework pytorch
comments powered by Disqus