Paper Group ANR 159
Global Deconvolutional Networks for Semantic Segmentation. Syntax-based Attention Model for Natural Language Inference. Investigating the influence of noise and distractors on the interpretation of neural networks. Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models. End-to-End Answer Chunk Extraction and Ran …
Global Deconvolutional Networks for Semantic Segmentation
Title | Global Deconvolutional Networks for Semantic Segmentation |
Authors | Vladimir Nekrasov, Janghoon Ju, Jaesik Choi |
Abstract | Semantic image segmentation is a principal problem in computer vision, where the aim is to correctly classify each individual pixel of an image into a semantic label. Its widespread use in many areas, including medical imaging and autonomous driving, has fostered extensive research in recent years. Empirical improvements in tackling this task have primarily been motivated by successful exploitation of Convolutional Neural Networks (CNNs) pre-trained for image classification and object recognition. However, the pixel-wise labelling with CNNs has its own unique challenges: (1) an accurate deconvolution, or upsampling, of low-resolution output into a higher-resolution segmentation mask and (2) an inclusion of global information, or context, within locally extracted features. To address these issues, we propose a novel architecture to conduct the equivalent of the deconvolution operation globally and acquire dense predictions. We demonstrate that it leads to improved performance of state-of-the-art semantic segmentation models on the PASCAL VOC 2012 benchmark, reaching 74.0% mean IU accuracy on the test set. |
Tasks | Autonomous Driving, Image Classification, Object Recognition, Semantic Segmentation |
Published | 2016-02-12 |
URL | http://arxiv.org/abs/1602.03930v2 |
http://arxiv.org/pdf/1602.03930v2.pdf | |
PWC | https://paperswithcode.com/paper/global-deconvolutional-networks-for-semantic |
Repo | |
Framework | |
Syntax-based Attention Model for Natural Language Inference
Title | Syntax-based Attention Model for Natural Language Inference |
Authors | PengFei Liu, Xipeng Qiu, Xuanjing Huang |
Abstract | Introducing attentional mechanism in neural network is a powerful concept, and has achieved impressive results in many natural language processing tasks. However, most of the existing models impose attentional distribution on a flat topology, namely the entire input representation sequence. Clearly, any well-formed sentence has its accompanying syntactic tree structure, which is a much rich topology. Applying attention to such topology not only exploits the underlying syntax, but also makes attention more interpretable. In this paper, we explore this direction in the context of natural language inference. The results demonstrate its efficacy. We also perform extensive qualitative analysis, deriving insights and intuitions of why and how our model works. |
Tasks | Natural Language Inference |
Published | 2016-07-22 |
URL | http://arxiv.org/abs/1607.06556v1 |
http://arxiv.org/pdf/1607.06556v1.pdf | |
PWC | https://paperswithcode.com/paper/syntax-based-attention-model-for-natural |
Repo | |
Framework | |
Investigating the influence of noise and distractors on the interpretation of neural networks
Title | Investigating the influence of noise and distractors on the interpretation of neural networks |
Authors | Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, Sven Dähne |
Abstract | Understanding neural networks is becoming increasingly important. Over the last few years different types of visualisation and explanation methods have been proposed. However, none of them explicitly considered the behaviour in the presence of noise and distracting elements. In this work, we will show how noise and distracting dimensions can influence the result of an explanation model. This gives a new theoretical insights to aid selection of the most appropriate explanation model within the deep-Taylor decomposition framework. |
Tasks | |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07270v1 |
http://arxiv.org/pdf/1611.07270v1.pdf | |
PWC | https://paperswithcode.com/paper/investigating-the-influence-of-noise-and |
Repo | |
Framework | |
Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models
Title | Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models |
Authors | Ziqiang Shi, Rujie Liu |
Abstract | PROXTONE is a novel and fast method for optimization of large scale non-smooth convex problem \cite{shi2015large}. In this work, we try to use PROXTONE method in solving large scale \emph{non-smooth non-convex} problems, for example training of sparse deep neural network (sparse DNN) or sparse convolutional neural network (sparse CNN) for embedded or mobile device. PROXTONE converges much faster than first order methods, while first order method is easy in deriving and controlling the sparseness of the solutions. Thus in some applications, in order to train sparse models fast, we propose to combine the merits of both methods, that is we use PROXTONE in the first several epochs to reach the neighborhood of an optimal solution, and then use the first order method to explore the possibility of sparsity in the following training. We call such method PROXTONE plus (PROXTONE$^+$). Both PROXTONE and PROXTONE$^+$ are tested in our experiments, and which demonstrate both methods improved convergence speed twice as fast at least on diverse sparse model learning problems, and at the same time reduce the size to 0.5% for DNN models. The source of all the algorithms is available upon request. |
Tasks | |
Published | 2016-04-18 |
URL | http://arxiv.org/abs/1604.05024v1 |
http://arxiv.org/pdf/1604.05024v1.pdf | |
PWC | https://paperswithcode.com/paper/empirical-study-of-proxtone-and-proxtone-for |
Repo | |
Framework | |
End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension
Title | End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension |
Authors | Yang Yu, Wei Zhang, Kazi Hasan, Mo Yu, Bing Xiang, Bowen Zhou |
Abstract | This paper proposes dynamic chunk reader (DCR), an end-to-end neural reading comprehension (RC) model that is able to extract and rank a set of answer candidates from a given document to answer questions. DCR is able to predict answers of variable lengths, whereas previous neural RC models primarily focused on predicting single tokens or entities. DCR encodes a document and an input question with recurrent neural networks, and then applies a word-by-word attention mechanism to acquire question-aware representations for the document, followed by the generation of chunk representations and a ranking module to propose the top-ranked chunk as the answer. Experimental results show that DCR achieves state-of-the-art exact match and F1 scores on the SQuAD dataset. |
Tasks | Question Answering, Reading Comprehension |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.09996v2 |
http://arxiv.org/pdf/1610.09996v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-answer-chunk-extraction-and |
Repo | |
Framework | |
Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features
Title | Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features |
Authors | Attapol T. Rutherford, Vera Demberg, Nianwen Xue |
Abstract | Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Surface features achieve good performance, but they are not readily applicable to other languages without semantic lexicons. Previous neural models require parses, surface features, or a small label set to work well. Here, we propose neural network models that are based on feedforward and long-short term memory architecture without any surface features. To our surprise, our best configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Under various fine-grained label sets and a cross-linguistic setting, our feedforward models perform consistently better or at least just as well as systems that require hand-crafted surface features. Our models present the first neural Chinese discourse parser in the style of Chinese Discourse Treebank, showing that our results hold cross-linguistically. |
Tasks | Implicit Discourse Relation Classification, Relation Classification |
Published | 2016-06-07 |
URL | http://arxiv.org/abs/1606.01990v1 |
http://arxiv.org/pdf/1606.01990v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-models-for-implicit-discourse |
Repo | |
Framework | |
City-Identification of Flickr Videos Using Semantic Acoustic Features
Title | City-Identification of Flickr Videos Using Semantic Acoustic Features |
Authors | Benjamin Elizalde, Guan-Lin Chao, Ming Zeng, Ian Lane |
Abstract | City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be made to complement the other modalities. In particular, we present a method to compute and use semantic acoustic features to perform city-identification and the features show semantic evidence of the identification. The semantic evidence is given by a taxonomy of urban sounds and expresses the potential presence of these sounds in the city- soundtracks. We used the MediaEval Placing Task set, which contains Flickr videos labeled by city. In addition, we used the UrbanSound8K set containing audio clips labeled by sound- type. Our method improved the state-of-the-art performance and provides a novel semantic approach to this task |
Tasks | |
Published | 2016-07-12 |
URL | http://arxiv.org/abs/1607.03257v1 |
http://arxiv.org/pdf/1607.03257v1.pdf | |
PWC | https://paperswithcode.com/paper/city-identification-of-flickr-videos-using |
Repo | |
Framework | |
Combining multiple resolutions into hierarchical representations for kernel-based image classification
Title | Combining multiple resolutions into hierarchical representations for kernel-based image classification |
Authors | Yanwei Cui, Sébastien Lefevre, Laetitia Chapel, Anne Puissant |
Abstract | Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical representation, where coarser levels are built from a Low Spatial Resolution (LSR) or Medium Spatial Resolution (MSR) image while finer levels are generated from a High Spatial Resolution (HSR) or Very High Spatial Resolution (VHSR) image. Such a representation allows one to benefit from the context information thanks to the coarser levels, and subregions spatial arrangement information thanks to the finer levels. Two dedicated structured kernels are then used to perform machine learning directly on the constructed hierarchical representation. This strategy overcomes the limits of conventional GEOBIA classification procedures that can handle only one or very few pre-selected scales. Experiments run on an urban classification task show that the proposed approach can highly improve the classification accuracy w.r.t. conventional approaches working on a single scale. |
Tasks | Image Classification |
Published | 2016-07-09 |
URL | http://arxiv.org/abs/1607.02654v2 |
http://arxiv.org/pdf/1607.02654v2.pdf | |
PWC | https://paperswithcode.com/paper/combining-multiple-resolutions-into |
Repo | |
Framework | |
Long-term Planning by Short-term Prediction
Title | Long-term Planning by Short-term Prediction |
Authors | Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, Amnon Shashua |
Abstract | We consider planning problems, that often arise in autonomous driving applications, in which an agent should decide on immediate actions so as to optimize a long term objective. For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge. Such problems are characterized by continuous state and action spaces, and by interaction with multiple agents, whose behavior can be adversarial. We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces. We propose to tackle the planning task by decomposing the problem into two phases: First, we apply supervised learning for predicting the near future based on the present. We require that the predictor will be differentiable with respect to the representation of the present. Second, we model a full trajectory of the agent using a recurrent neural network, where unexplained factors are modeled as (additive) input nodes. This allows us to solve the long-term planning problem using supervised learning techniques and direct optimization over the recurrent neural network. Our approach enables us to learn robust policies by incorporating adversarial elements to the environment. |
Tasks | Autonomous Driving |
Published | 2016-02-04 |
URL | http://arxiv.org/abs/1602.01580v1 |
http://arxiv.org/pdf/1602.01580v1.pdf | |
PWC | https://paperswithcode.com/paper/long-term-planning-by-short-term-prediction |
Repo | |
Framework | |
Spatio-temporal Gaussian processes modeling of dynamical systems in systems biology
Title | Spatio-temporal Gaussian processes modeling of dynamical systems in systems biology |
Authors | Mu Niu, Zhenwen Dai, Neil Lawrence, Kolja Becker |
Abstract | Quantitative modeling of post-transcriptional regulation process is a challenging problem in systems biology. A mechanical model of the regulatory process needs to be able to describe the available spatio-temporal protein concentration and mRNA expression data and recover the continuous spatio-temporal fields. Rigorous methods are required to identify model parameters. A promising approach to deal with these difficulties is proposed using Gaussian process as a prior distribution over the latent function of protein concentration and mRNA expression. In this study, we consider a partial differential equation mechanical model with differential operators and latent function. Since the operators at stake are linear, the information from the physical model can be encoded into the kernel function. Hybrid Monte Carlo methods are employed to carry out Bayesian inference of the partial differential equation parameters and Gaussian process kernel parameters. The spatio-temporal field of protein concentration and mRNA expression are reconstructed without explicitly solving the partial differential equation. |
Tasks | Bayesian Inference, Gaussian Processes |
Published | 2016-10-17 |
URL | http://arxiv.org/abs/1610.05163v1 |
http://arxiv.org/pdf/1610.05163v1.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-gaussian-processes-modeling |
Repo | |
Framework | |
Outlier absorbing based on a Bayesian approach
Title | Outlier absorbing based on a Bayesian approach |
Authors | Parsa Bagherzadeh, Hadi Sadoghi Yazdi |
Abstract | The presence of outliers is prevalent in machine learning applications and may produce misleading results. In this paper a new method for dealing with outliers and anomal samples is proposed. To overcome the outlier issue, the proposed method combines the global and local views of the samples. By combination of these views, our algorithm performs in a robust manner. The experimental results show the capabilities of the proposed method. |
Tasks | |
Published | 2016-07-02 |
URL | http://arxiv.org/abs/1607.00466v1 |
http://arxiv.org/pdf/1607.00466v1.pdf | |
PWC | https://paperswithcode.com/paper/outlier-absorbing-based-on-a-bayesian |
Repo | |
Framework | |
Continuation semantics for multi-quantifier sentences: operation-based approaches
Title | Continuation semantics for multi-quantifier sentences: operation-based approaches |
Authors | Justyna Grudzinska, Marek Zawadowski |
Abstract | Classical scope-assignment strategies for multi-quantifier sentences involve quantifier phrase (QP)-movement. More recent continuation-based approaches provide a compelling alternative, for they interpret QP’s in situ - without resorting to Logical Forms or any structures beyond the overt syntax. The continuation-based strategies can be divided into two groups: those that locate the source of scope-ambiguity in the rules of semantic composition and those that attribute it to the lexical entries for the quantifier words. In this paper, we focus on the former operation-based approaches and the nature of the semantic operations involved. More specifically, we discuss three such possible operation-based strategies for multi-quantifier sentences, together with their relative merits and costs. |
Tasks | Semantic Composition |
Published | 2016-07-31 |
URL | http://arxiv.org/abs/1608.00255v2 |
http://arxiv.org/pdf/1608.00255v2.pdf | |
PWC | https://paperswithcode.com/paper/continuation-semantics-for-multi-quantifier |
Repo | |
Framework | |
Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator
Title | Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator |
Authors | Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade |
Abstract | We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations. Given a single image and auxiliary scene information in the form of camera parameters and geometric layout of the scene, the Visual Compiler first infers geometrically and photometrically accurate images of humans in that scene through the use of computer graphics rendering. Using these renders we learn a scene-and-region specific spatially-varying fully convolutional neural network, for simultaneous detection, pose estimation and segmentation of pedestrians. We demonstrate that when real human annotated data is scarce or non-existent, our data generation strategy can provide an excellent solution for bootstrapping human detection and pose estimation. Experimental results show that our approach outperforms off-the-shelf state-of-the-art pedestrian detectors and pose estimators that are trained on real data. |
Tasks | Human Detection, Pose Estimation |
Published | 2016-12-15 |
URL | http://arxiv.org/abs/1612.05234v1 |
http://arxiv.org/pdf/1612.05234v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-compiler-synthesizing-a-scene-specific |
Repo | |
Framework | |
Learning to Hash-tag Videos with Tag2Vec
Title | Learning to Hash-tag Videos with Tag2Vec |
Authors | Aditya Singh, Saurabh Saini, Rajvi Shah, PJ Narayanan |
Abstract | User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash-tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data-driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study. |
Tasks | |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04061v1 |
http://arxiv.org/pdf/1612.04061v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-hash-tag-videos-with-tag2vec |
Repo | |
Framework | |
Maximum entropy models for generation of expressive music
Title | Maximum entropy models for generation of expressive music |
Authors | Simon Moulieras, François Pachet |
Abstract | In the context of contemporary monophonic music, expression can be seen as the difference between a musical performance and its symbolic representation, i.e. a musical score. In this paper, we show how Maximum Entropy (MaxEnt) models can be used to generate musical expression in order to mimic a human performance. As a training corpus, we had a professional pianist play about 150 melodies of jazz, pop, and latin jazz. The results show a good predictive power, validating the choice of our model. Additionally, we set up a listening test whose results reveal that on average, people significantly prefer the melodies generated by the MaxEnt model than the ones without any expression, or with fully random expression. Furthermore, in some cases, MaxEnt melodies are almost as popular as the human performed ones. |
Tasks | |
Published | 2016-10-12 |
URL | http://arxiv.org/abs/1610.03606v1 |
http://arxiv.org/pdf/1610.03606v1.pdf | |
PWC | https://paperswithcode.com/paper/maximum-entropy-models-for-generation-of |
Repo | |
Framework | |