May 7, 2019

2799 words 14 mins read

Paper Group AWR 67

Agnostic Estimation of Mean and Covariance. PlaNet - Photo Geolocation with Convolutional Neural Networks. DropNeuron: Simplifying the Structure of Deep Neural Networks. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. Lexical Query Modeling in Session Search. Top-down Visual Saliency Guided by Captions. Detect, Replace, …

Agnostic Estimation of Mean and Covariance


Title	Agnostic Estimation of Mean and Covariance
Authors	Kevin A. Lai, Anup B. Rao, Santosh Vempala
Abstract	We consider the problem of estimating the mean and covariance of a distribution from iid samples in $\mathbb{R}^n$, in the presence of an $\eta$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. The agnostic problem includes many interesting special cases, e.g., learning the parameters of a single Gaussian (or finding the best-fit Gaussian) when $\eta$ fraction of data is adversarially corrupted, agnostically learning a mixture of Gaussians, agnostic ICA, etc. We present polynomial-time algorithms to estimate the mean and covariance with error guarantees in terms of information-theoretic lower bounds. As a corollary, we also obtain an agnostic algorithm for Singular Value Decomposition.
Tasks
Published	2016-04-24
URL	http://arxiv.org/abs/1604.06968v2
PDF	http://arxiv.org/pdf/1604.06968v2.pdf
PWC	https://paperswithcode.com/paper/agnostic-estimation-of-mean-and-covariance
Repo	https://github.com/kal2000/AgnosticMeanAndCovarianceCode
Framework	none

PlaNet - Photo Geolocation with Convolutional Neural Networks


Title	PlaNet - Photo Geolocation with Convolutional Neural Networks
Authors	Tobias Weyand, Ilya Kostrikov, James Philbin
Abstract	Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model.
Tasks	Image Retrieval
Published	2016-02-17
URL	http://arxiv.org/abs/1602.05314v1
PDF	http://arxiv.org/pdf/1602.05314v1.pdf
PWC	https://paperswithcode.com/paper/planet-photo-geolocation-with-convolutional
Repo	https://github.com/gjacopo/poppysite
Framework	none

DropNeuron: Simplifying the Structure of Deep Neural Networks


Title	DropNeuron: Simplifying the Structure of Deep Neural Networks
Authors	Wei Pan, Hao Dong, Yike Guo
Abstract	Deep learning using multi-layer neural networks (NNs) architecture manifests superb power in modern machine learning systems. The trained Deep Neural Networks (DNNs) are typically large. The question we would like to address is whether it is possible to simplify the NN during training process to achieve a reasonable performance within an acceptable computational time. We presented a novel approach of optimising a deep neural network through regularisation of net- work architecture. We proposed regularisers which support a simple mechanism of dropping neurons during a network training process. The method supports the construction of a simpler deep neural networks with compatible performance with its simplified version. As a proof of concept, we evaluate the proposed method with examples including sparse linear regression, deep autoencoder and convolutional neural network. The valuations demonstrate excellent performance. The code for this work can be found in http://www.github.com/panweihit/DropNeuron
Tasks
Published	2016-06-23
URL	http://arxiv.org/abs/1606.07326v3
PDF	http://arxiv.org/pdf/1606.07326v3.pdf
PWC	https://paperswithcode.com/paper/dropneuron-simplifying-the-structure-of-deep
Repo	https://github.com/panweihit/DropNeuron
Framework	tf

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures


Title	End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Authors	Makoto Miwa, Mohit Bansal
Abstract	We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a single model. We further encourage detection of entities during training and use of entity information in relation extraction via entity pretraining and scheduled sampling. Our model improves over the state-of-the-art feature-based model on end-to-end relation extraction, achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and ACE2004, respectively. We also show that our LSTM-RNN based model compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Finally, we present an extensive ablation analysis of several model components.
Tasks	Relation Classification, Relation Extraction
Published	2016-01-05
URL	http://arxiv.org/abs/1601.00770v3
PDF	http://arxiv.org/pdf/1601.00770v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-relation-extraction-using-lstms-on
Repo	https://github.com/tticoin/LSTM-ER
Framework	none

Lexical Query Modeling in Session Search


Title	Lexical Query Modeling in Session Search
Authors	Christophe Van Gysel, Evangelos Kanoulas, Maarten de Rijke
Abstract	Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighing perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search.
Tasks
Published	2016-08-23
URL	http://arxiv.org/abs/1608.06656v1
PDF	http://arxiv.org/pdf/1608.06656v1.pdf
PWC	https://paperswithcode.com/paper/lexical-query-modeling-in-session-search
Repo	https://github.com/cvangysel/sesh
Framework	none

Top-down Visual Saliency Guided by Captions


Title	Top-down Visual Saliency Guided by Captions
Authors	Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko
Abstract	Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult to explain. Top-down neural saliency methods can find important regions given a high-level semantic task such as object classification, but cannot use a natural language sentence as the top-down input for the task. In this paper, we propose Caption-Guided Visual Saliency to expose the region-to-word mapping in modern encoder-decoder networks and demonstrate that it is learned implicitly from caption training data, without any pixel-level annotations. Our approach can produce spatial or spatiotemporal heatmaps for both predicted captions, and for arbitrary query sentences. It recovers saliency without the overhead of introducing explicit attention layers, and can be used to analyze a variety of existing model architectures and improve their design. Evaluation on large-scale video and image datasets demonstrates that our approach achieves comparable captioning performance with existing methods while providing more accurate saliency heatmaps. Our code is available at visionlearninggroup.github.io/caption-guided-saliency/.
Tasks	Object Classification, Video Captioning
Published	2016-12-21
URL	http://arxiv.org/abs/1612.07360v2
PDF	http://arxiv.org/pdf/1612.07360v2.pdf
PWC	https://paperswithcode.com/paper/top-down-visual-saliency-guided-by-captions
Repo	https://github.com/indigo-dc/seeds-classification-theano
Framework	none

Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling


Title	Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling
Authors	Spyros Gidaris, Nikos Komodakis
Abstract	Pixel wise image labeling is an interesting and challenging problem with great significance in the computer vision community. In order for a dense labeling algorithm to be able to achieve accurate and precise results, it has to consider the dependencies that exist in the joint space of both the input and the output variables. An implicit approach for modeling those dependencies is by training a deep neural network that, given as input an initial estimate of the output labels and the input image, it will be able to predict a new refined estimate for the labels. In this context, our work is concerned with what is the optimal architecture for performing the label improvement task. We argue that the prior approaches of either directly predicting new label estimates or predicting residual corrections w.r.t. the initial labels with feed-forward deep network architectures are sub-optimal. Instead, we propose a generic architecture that decomposes the label improvement task to three steps: 1) detecting the initial label estimates that are incorrect, 2) replacing the incorrect labels with new ones, and finally 3) refining the renewed labels by predicting residual corrections w.r.t. them. Furthermore, we explore and compare various other alternative architectures that consist of the aforementioned Detection, Replace, and Refine components. We extensively evaluate the examined architectures in the challenging task of dense disparity estimation (stereo matching) and we report both quantitative and qualitative results on three different datasets. Finally, our dense disparity estimation network that implements the proposed generic architecture, achieves state-of-the-art results in the KITTI 2015 test surpassing prior approaches by a significant margin.
Tasks	Disparity Estimation, Stereo Matching, Stereo Matching Hand, Structured Prediction
Published	2016-12-14
URL	http://arxiv.org/abs/1612.04770v1
PDF	http://arxiv.org/pdf/1612.04770v1.pdf
PWC	https://paperswithcode.com/paper/detect-replace-refine-deep-structured
Repo	https://github.com/gidariss/DRR_struct_pred
Framework	none

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings


Title	Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings
Authors	Ondřej Dušek, Filip Jurčíček
Abstract	We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach. We were able to train both setups successfully using very little training data. The joint setup offers better performance, surpassing state-of-the-art with regards to n-gram-based scores while providing more relevant outputs.
Tasks
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05491v1
PDF	http://arxiv.org/pdf/1606.05491v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-generation-for-spoken
Repo	https://github.com/UFAL-DSG/tgen
Framework	tf

SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents


Title	SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Authors	Ramesh Nallapati, Feifei Zhai, Bowen Zhou
Abstract	We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.
Tasks	Document Summarization
Published	2016-11-14
URL	http://arxiv.org/abs/1611.04230v1
PDF	http://arxiv.org/pdf/1611.04230v1.pdf
PWC	https://paperswithcode.com/paper/summarunner-a-recurrent-neural-network-based
Repo	https://github.com/amagooda/SummaRuNNer_coattention
Framework	pytorch

C-RNN-GAN: Continuous recurrent neural networks with adversarial training


Title	C-RNN-GAN: Continuous recurrent neural networks with adversarial training
Authors	Olof Mogren
Abstract	Generative adversarial networks have been proposed as a way of efficiently training deep generative neural networks. We propose a generative adversarial model that works on continuous sequential data, and apply it by training it on a collection of classical music. We conclude that it generates music that sounds better and better as the model is trained, report statistics on generated music, and let the reader judge the quality by downloading the generated songs.
Tasks
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09904v1
PDF	http://arxiv.org/pdf/1611.09904v1.pdf
PWC	https://paperswithcode.com/paper/c-rnn-gan-continuous-recurrent-neural
Repo	https://github.com/lblakely/ganProject
Framework	none

ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization


Title	ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization
Authors	Vadim Kantorov, Maxime Oquab, Minsu Cho, Ivan Laptev
Abstract	We aim to localize objects in images using image-level supervision only. Previous approaches to this problem mainly focus on discriminative object regions and often fail to locate precise object boundaries. We address this problem by introducing two types of context-aware guidance models, additive and contrastive models, that leverage their surrounding context regions to improve localization. The additive model encourages the predicted object region to be supported by its surrounding context region. The contrastive model encourages the predicted object region to be outstanding from its surrounding context region. Our approach benefits from the recent success of convolutional neural networks for object recognition and extends Fast R-CNN to weakly supervised object localization. Extensive experimental evaluation on the PASCAL VOC 2007 and 2012 benchmarks shows hat our context-aware approach significantly improves weakly supervised localization and detection.
Tasks	Object Localization, Object Recognition, Weakly Supervised Object Detection, Weakly-Supervised Object Localization
Published	2016-09-14
URL	http://arxiv.org/abs/1609.04331v1
PDF	http://arxiv.org/pdf/1609.04331v1.pdf
PWC	https://paperswithcode.com/paper/contextlocnet-context-aware-deep-network
Repo	https://github.com/vadimkantorov/contextlocnet
Framework	torch

Very Deep Convolutional Networks for Text Classification


Title	Very Deep Convolutional Networks for Text Classification
Authors	Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun
Abstract	The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. We are able to show that the performance of this model increases with depth: using up to 29 convolutional layers, we report improvements over the state-of-the-art on several public text classification tasks. To the best of our knowledge, this is the first time that very deep convolutional nets have been applied to text processing.
Tasks	Text Classification
Published	2016-06-06
URL	http://arxiv.org/abs/1606.01781v2
PDF	http://arxiv.org/pdf/1606.01781v2.pdf
PWC	https://paperswithcode.com/paper/very-deep-convolutional-networks-for-text
Repo	https://github.com/yeahshow/word2vec_medical_record
Framework	none

AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification


Title	AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification
Authors	Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang
Abstract	Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in remote sensing area and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing datasets for aerial scene classification like UC-Merced dataset and WHU-RS19 are with relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image Dataset (AID): a large-scale dataset for aerial scene classification. The goal of AID is to advance the state-of-the-arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than ten thousands aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely-used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.
Tasks	Scene Classification
Published	2016-08-18
URL	http://arxiv.org/abs/1608.05167v1
PDF	http://arxiv.org/pdf/1608.05167v1.pdf
PWC	https://paperswithcode.com/paper/aid-a-benchmark-dataset-for-performance
Repo	https://github.com/MLEnthusiast/MHCLN
Framework	tf

A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI


Title	A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI
Authors	Phi Vu Tran
Abstract	Automated cardiac segmentation from magnetic resonance imaging datasets is an essential step in the timely diagnosis and management of cardiac pathologies. We propose to tackle the problem of automated left and right ventricle segmentation through the application of a deep fully convolutional neural network architecture. Our model is efficiently trained end-to-end in a single learning stage from whole-image inputs and ground truths to make inference at every pixel. To our knowledge, this is the first application of a fully convolutional neural network architecture for pixel-wise labeling in cardiac magnetic resonance imaging. Numerical experiments demonstrate that our model is robust to outperform previous fully automated methods across multiple evaluation measures on a range of cardiac datasets. Moreover, our model is fast and can leverage commodity compute resources such as the graphics processing unit to enable state-of-the-art cardiac segmentation at massive scales. The models and code are available at https://github.com/vuptran/cardiac-segmentation
Tasks	Cardiac Segmentation
Published	2016-04-02
URL	http://arxiv.org/abs/1604.00494v3
PDF	http://arxiv.org/pdf/1604.00494v3.pdf
PWC	https://paperswithcode.com/paper/a-fully-convolutional-neural-network-for-1
Repo	https://github.com/modelhub-ai/cardiac-fcn
Framework	none

A Fast Unified Model for Parsing and Sentence Understanding


Title	A Fast Unified Model for Parsing and Sentence Understanding
Authors	Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, Christopher Potts
Abstract	Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Interpreter Neural Network (SPINN), which combines parsing and interpretation within a single tree-sequence hybrid model by integrating tree-structured sentence interpretation into the linear sequential structure of a shift-reduce parser. Our model supports batched computation for a speedup of up to 25 times over other tree-structured models, and its integrated parser can operate on unparsed data with little loss in accuracy. We evaluate it on the Stanford NLI entailment task and show that it significantly outperforms other sentence-encoding models.
Tasks
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06021v3
PDF	http://arxiv.org/pdf/1603.06021v3.pdf
PWC	https://paperswithcode.com/paper/a-fast-unified-model-for-parsing-and-sentence
Repo	https://github.com/NYU-MLL/spinn
Framework	pytorch