Paper Group AWR 67
Agnostic Estimation of Mean and Covariance. PlaNet - Photo Geolocation with Convolutional Neural Networks. DropNeuron: Simplifying the Structure of Deep Neural Networks. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. Lexical Query Modeling in Session Search. Top-down Visual Saliency Guided by Captions. Detect, Replace, …
Agnostic Estimation of Mean and Covariance
Title | Agnostic Estimation of Mean and Covariance |
Authors | Kevin A. Lai, Anup B. Rao, Santosh Vempala |
Abstract | We consider the problem of estimating the mean and covariance of a distribution from iid samples in $\mathbb{R}^n$, in the presence of an $\eta$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. The agnostic problem includes many interesting special cases, e.g., learning the parameters of a single Gaussian (or finding the best-fit Gaussian) when $\eta$ fraction of data is adversarially corrupted, agnostically learning a mixture of Gaussians, agnostic ICA, etc. We present polynomial-time algorithms to estimate the mean and covariance with error guarantees in terms of information-theoretic lower bounds. As a corollary, we also obtain an agnostic algorithm for Singular Value Decomposition. |
Tasks | |
Published | 2016-04-24 |
URL | http://arxiv.org/abs/1604.06968v2 |
http://arxiv.org/pdf/1604.06968v2.pdf | |
PWC | https://paperswithcode.com/paper/agnostic-estimation-of-mean-and-covariance |
Repo | https://github.com/kal2000/AgnosticMeanAndCovarianceCode |
Framework | none |
PlaNet - Photo Geolocation with Convolutional Neural Networks
Title | PlaNet - Photo Geolocation with Convolutional Neural Networks |
Authors | Tobias Weyand, Ilya Kostrikov, James Philbin |
Abstract | Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model. |
Tasks | Image Retrieval |
Published | 2016-02-17 |
URL | http://arxiv.org/abs/1602.05314v1 |
http://arxiv.org/pdf/1602.05314v1.pdf | |
PWC | https://paperswithcode.com/paper/planet-photo-geolocation-with-convolutional |
Repo | https://github.com/gjacopo/poppysite |
Framework | none |
DropNeuron: Simplifying the Structure of Deep Neural Networks
Title | DropNeuron: Simplifying the Structure of Deep Neural Networks |
Authors | Wei Pan, Hao Dong, Yike Guo |
Abstract | Deep learning using multi-layer neural networks (NNs) architecture manifests superb power in modern machine learning systems. The trained Deep Neural Networks (DNNs) are typically large. The question we would like to address is whether it is possible to simplify the NN during training process to achieve a reasonable performance within an acceptable computational time. We presented a novel approach of optimising a deep neural network through regularisation of net- work architecture. We proposed regularisers which support a simple mechanism of dropping neurons during a network training process. The method supports the construction of a simpler deep neural networks with compatible performance with its simplified version. As a proof of concept, we evaluate the proposed method with examples including sparse linear regression, deep autoencoder and convolutional neural network. The valuations demonstrate excellent performance. The code for this work can be found in http://www.github.com/panweihit/DropNeuron |
Tasks | |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07326v3 |
http://arxiv.org/pdf/1606.07326v3.pdf | |
PWC | https://paperswithcode.com/paper/dropneuron-simplifying-the-structure-of-deep |
Repo | https://github.com/panweihit/DropNeuron |
Framework | tf |
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Title | End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures |
Authors | Makoto Miwa, Mohit Bansal |
Abstract | We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a single model. We further encourage detection of entities during training and use of entity information in relation extraction via entity pretraining and scheduled sampling. Our model improves over the state-of-the-art feature-based model on end-to-end relation extraction, achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and ACE2004, respectively. We also show that our LSTM-RNN based model compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Finally, we present an extensive ablation analysis of several model components. |
Tasks | Relation Classification, Relation Extraction |
Published | 2016-01-05 |
URL | http://arxiv.org/abs/1601.00770v3 |
http://arxiv.org/pdf/1601.00770v3.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-relation-extraction-using-lstms-on |
Repo | https://github.com/tticoin/LSTM-ER |
Framework | none |
Lexical Query Modeling in Session Search
Title | Lexical Query Modeling in Session Search |
Authors | Christophe Van Gysel, Evangelos Kanoulas, Maarten de Rijke |
Abstract | Lexical query modeling has been the leading paradigm for session search. In this paper, we analyze TREC session query logs and compare the performance of different lexical matching approaches for session search. Naive methods based on term frequency weighing perform on par with specialized session models. In addition, we investigate the viability of lexical query models in the setting of session search. We give important insights into the potential and limitations of lexical query modeling for session search and propose future directions for the field of session search. |
Tasks | |
Published | 2016-08-23 |
URL | http://arxiv.org/abs/1608.06656v1 |
http://arxiv.org/pdf/1608.06656v1.pdf | |
PWC | https://paperswithcode.com/paper/lexical-query-modeling-in-session-search |
Repo | https://github.com/cvangysel/sesh |
Framework | none |
Top-down Visual Saliency Guided by Captions
Title | Top-down Visual Saliency Guided by Captions |
Authors | Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko |
Abstract | Neural image/video captioning models can generate accurate descriptions, but their internal process of mapping regions to words is a black box and therefore difficult to explain. Top-down neural saliency methods can find important regions given a high-level semantic task such as object classification, but cannot use a natural language sentence as the top-down input for the task. In this paper, we propose Caption-Guided Visual Saliency to expose the region-to-word mapping in modern encoder-decoder networks and demonstrate that it is learned implicitly from caption training data, without any pixel-level annotations. Our approach can produce spatial or spatiotemporal heatmaps for both predicted captions, and for arbitrary query sentences. It recovers saliency without the overhead of introducing explicit attention layers, and can be used to analyze a variety of existing model architectures and improve their design. Evaluation on large-scale video and image datasets demonstrates that our approach achieves comparable captioning performance with existing methods while providing more accurate saliency heatmaps. Our code is available at visionlearninggroup.github.io/caption-guided-saliency/. |
Tasks | Object Classification, Video Captioning |
Published | 2016-12-21 |
URL | http://arxiv.org/abs/1612.07360v2 |
http://arxiv.org/pdf/1612.07360v2.pdf | |
PWC | https://paperswithcode.com/paper/top-down-visual-saliency-guided-by-captions |
Repo | https://github.com/indigo-dc/seeds-classification-theano |
Framework | none |
Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling
Title | Detect, Replace, Refine: Deep Structured Prediction For Pixel Wise Labeling |
Authors | Spyros Gidaris, Nikos Komodakis |
Abstract | Pixel wise image labeling is an interesting and challenging problem with great significance in the computer vision community. In order for a dense labeling algorithm to be able to achieve accurate and precise results, it has to consider the dependencies that exist in the joint space of both the input and the output variables. An implicit approach for modeling those dependencies is by training a deep neural network that, given as input an initial estimate of the output labels and the input image, it will be able to predict a new refined estimate for the labels. In this context, our work is concerned with what is the optimal architecture for performing the label improvement task. We argue that the prior approaches of either directly predicting new label estimates or predicting residual corrections w.r.t. the initial labels with feed-forward deep network architectures are sub-optimal. Instead, we propose a generic architecture that decomposes the label improvement task to three steps: 1) detecting the initial label estimates that are incorrect, 2) replacing the incorrect labels with new ones, and finally 3) refining the renewed labels by predicting residual corrections w.r.t. them. Furthermore, we explore and compare various other alternative architectures that consist of the aforementioned Detection, Replace, and Refine components. We extensively evaluate the examined architectures in the challenging task of dense disparity estimation (stereo matching) and we report both quantitative and qualitative results on three different datasets. Finally, our dense disparity estimation network that implements the proposed generic architecture, achieves state-of-the-art results in the KITTI 2015 test surpassing prior approaches by a significant margin. |
Tasks | Disparity Estimation, Stereo Matching, Stereo Matching Hand, Structured Prediction |
Published | 2016-12-14 |
URL | http://arxiv.org/abs/1612.04770v1 |
http://arxiv.org/pdf/1612.04770v1.pdf | |
PWC | https://paperswithcode.com/paper/detect-replace-refine-deep-structured |
Repo | https://github.com/gidariss/DRR_struct_pred |
Framework | none |
Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings
Title | Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings |
Authors | Ondřej Dušek, Filip Jurčíček |
Abstract | We present a natural language generator based on the sequence-to-sequence approach that can be trained to produce natural language strings as well as deep syntax dependency trees from input dialogue acts, and we use it to directly compare two-step generation with separate sentence planning and surface realization stages to a joint, one-step approach. We were able to train both setups successfully using very little training data. The joint setup offers better performance, surpassing state-of-the-art with regards to n-gram-based scores while providing more relevant outputs. |
Tasks | |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05491v1 |
http://arxiv.org/pdf/1606.05491v1.pdf | |
PWC | https://paperswithcode.com/paper/sequence-to-sequence-generation-for-spoken |
Repo | https://github.com/UFAL-DSG/tgen |
Framework | tf |
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents
Title | SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents |
Authors | Ramesh Nallapati, Feifei Zhai, Bowen Zhou |
Abstract | We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels. |
Tasks | Document Summarization |
Published | 2016-11-14 |
URL | http://arxiv.org/abs/1611.04230v1 |
http://arxiv.org/pdf/1611.04230v1.pdf | |
PWC | https://paperswithcode.com/paper/summarunner-a-recurrent-neural-network-based |
Repo | https://github.com/amagooda/SummaRuNNer_coattention |
Framework | pytorch |
C-RNN-GAN: Continuous recurrent neural networks with adversarial training
Title | C-RNN-GAN: Continuous recurrent neural networks with adversarial training |
Authors | Olof Mogren |
Abstract | Generative adversarial networks have been proposed as a way of efficiently training deep generative neural networks. We propose a generative adversarial model that works on continuous sequential data, and apply it by training it on a collection of classical music. We conclude that it generates music that sounds better and better as the model is trained, report statistics on generated music, and let the reader judge the quality by downloading the generated songs. |
Tasks | |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09904v1 |
http://arxiv.org/pdf/1611.09904v1.pdf | |
PWC | https://paperswithcode.com/paper/c-rnn-gan-continuous-recurrent-neural |
Repo | https://github.com/lblakely/ganProject |
Framework | none |
ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization
Title | ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization |
Authors | Vadim Kantorov, Maxime Oquab, Minsu Cho, Ivan Laptev |
Abstract | We aim to localize objects in images using image-level supervision only. Previous approaches to this problem mainly focus on discriminative object regions and often fail to locate precise object boundaries. We address this problem by introducing two types of context-aware guidance models, additive and contrastive models, that leverage their surrounding context regions to improve localization. The additive model encourages the predicted object region to be supported by its surrounding context region. The contrastive model encourages the predicted object region to be outstanding from its surrounding context region. Our approach benefits from the recent success of convolutional neural networks for object recognition and extends Fast R-CNN to weakly supervised object localization. Extensive experimental evaluation on the PASCAL VOC 2007 and 2012 benchmarks shows hat our context-aware approach significantly improves weakly supervised localization and detection. |
Tasks | Object Localization, Object Recognition, Weakly Supervised Object Detection, Weakly-Supervised Object Localization |
Published | 2016-09-14 |
URL | http://arxiv.org/abs/1609.04331v1 |
http://arxiv.org/pdf/1609.04331v1.pdf | |
PWC | https://paperswithcode.com/paper/contextlocnet-context-aware-deep-network |
Repo | https://github.com/vadimkantorov/contextlocnet |
Framework | torch |
Very Deep Convolutional Networks for Text Classification
Title | Very Deep Convolutional Networks for Text Classification |
Authors | Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun |
Abstract | The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. We are able to show that the performance of this model increases with depth: using up to 29 convolutional layers, we report improvements over the state-of-the-art on several public text classification tasks. To the best of our knowledge, this is the first time that very deep convolutional nets have been applied to text processing. |
Tasks | Text Classification |
Published | 2016-06-06 |
URL | http://arxiv.org/abs/1606.01781v2 |
http://arxiv.org/pdf/1606.01781v2.pdf | |
PWC | https://paperswithcode.com/paper/very-deep-convolutional-networks-for-text |
Repo | https://github.com/yeahshow/word2vec_medical_record |
Framework | none |
AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification
Title | AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification |
Authors | Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang |
Abstract | Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in remote sensing area and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing datasets for aerial scene classification like UC-Merced dataset and WHU-RS19 are with relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image Dataset (AID): a large-scale dataset for aerial scene classification. The goal of AID is to advance the state-of-the-arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than ten thousands aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely-used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark. |
Tasks | Scene Classification |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05167v1 |
http://arxiv.org/pdf/1608.05167v1.pdf | |
PWC | https://paperswithcode.com/paper/aid-a-benchmark-dataset-for-performance |
Repo | https://github.com/MLEnthusiast/MHCLN |
Framework | tf |
A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI
Title | A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI |
Authors | Phi Vu Tran |
Abstract | Automated cardiac segmentation from magnetic resonance imaging datasets is an essential step in the timely diagnosis and management of cardiac pathologies. We propose to tackle the problem of automated left and right ventricle segmentation through the application of a deep fully convolutional neural network architecture. Our model is efficiently trained end-to-end in a single learning stage from whole-image inputs and ground truths to make inference at every pixel. To our knowledge, this is the first application of a fully convolutional neural network architecture for pixel-wise labeling in cardiac magnetic resonance imaging. Numerical experiments demonstrate that our model is robust to outperform previous fully automated methods across multiple evaluation measures on a range of cardiac datasets. Moreover, our model is fast and can leverage commodity compute resources such as the graphics processing unit to enable state-of-the-art cardiac segmentation at massive scales. The models and code are available at https://github.com/vuptran/cardiac-segmentation |
Tasks | Cardiac Segmentation |
Published | 2016-04-02 |
URL | http://arxiv.org/abs/1604.00494v3 |
http://arxiv.org/pdf/1604.00494v3.pdf | |
PWC | https://paperswithcode.com/paper/a-fully-convolutional-neural-network-for-1 |
Repo | https://github.com/modelhub-ai/cardiac-fcn |
Framework | none |
A Fast Unified Model for Parsing and Sentence Understanding
Title | A Fast Unified Model for Parsing and Sentence Understanding |
Authors | Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, Christopher Potts |
Abstract | Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Interpreter Neural Network (SPINN), which combines parsing and interpretation within a single tree-sequence hybrid model by integrating tree-structured sentence interpretation into the linear sequential structure of a shift-reduce parser. Our model supports batched computation for a speedup of up to 25 times over other tree-structured models, and its integrated parser can operate on unparsed data with little loss in accuracy. We evaluate it on the Stanford NLI entailment task and show that it significantly outperforms other sentence-encoding models. |
Tasks | |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06021v3 |
http://arxiv.org/pdf/1603.06021v3.pdf | |
PWC | https://paperswithcode.com/paper/a-fast-unified-model-for-parsing-and-sentence |
Repo | https://github.com/NYU-MLL/spinn |
Framework | pytorch |