May 6, 2019

2826 words 14 mins read

Paper Group ANR 159

Global Deconvolutional Networks for Semantic Segmentation. Syntax-based Attention Model for Natural Language Inference. Investigating the influence of noise and distractors on the interpretation of neural networks. Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models. End-to-End Answer Chunk Extraction and Ran …

Global Deconvolutional Networks for Semantic Segmentation


Title	Global Deconvolutional Networks for Semantic Segmentation
Authors	Vladimir Nekrasov, Janghoon Ju, Jaesik Choi
Abstract	Semantic image segmentation is a principal problem in computer vision, where the aim is to correctly classify each individual pixel of an image into a semantic label. Its widespread use in many areas, including medical imaging and autonomous driving, has fostered extensive research in recent years. Empirical improvements in tackling this task have primarily been motivated by successful exploitation of Convolutional Neural Networks (CNNs) pre-trained for image classification and object recognition. However, the pixel-wise labelling with CNNs has its own unique challenges: (1) an accurate deconvolution, or upsampling, of low-resolution output into a higher-resolution segmentation mask and (2) an inclusion of global information, or context, within locally extracted features. To address these issues, we propose a novel architecture to conduct the equivalent of the deconvolution operation globally and acquire dense predictions. We demonstrate that it leads to improved performance of state-of-the-art semantic segmentation models on the PASCAL VOC 2012 benchmark, reaching 74.0% mean IU accuracy on the test set.
Tasks	Autonomous Driving, Image Classification, Object Recognition, Semantic Segmentation
Published	2016-02-12
URL	http://arxiv.org/abs/1602.03930v2
PDF	http://arxiv.org/pdf/1602.03930v2.pdf
PWC	https://paperswithcode.com/paper/global-deconvolutional-networks-for-semantic
Repo
Framework

Syntax-based Attention Model for Natural Language Inference


Title	Syntax-based Attention Model for Natural Language Inference
Authors	PengFei Liu, Xipeng Qiu, Xuanjing Huang
Abstract	Introducing attentional mechanism in neural network is a powerful concept, and has achieved impressive results in many natural language processing tasks. However, most of the existing models impose attentional distribution on a flat topology, namely the entire input representation sequence. Clearly, any well-formed sentence has its accompanying syntactic tree structure, which is a much rich topology. Applying attention to such topology not only exploits the underlying syntax, but also makes attention more interpretable. In this paper, we explore this direction in the context of natural language inference. The results demonstrate its efficacy. We also perform extensive qualitative analysis, deriving insights and intuitions of why and how our model works.
Tasks	Natural Language Inference
Published	2016-07-22
URL	http://arxiv.org/abs/1607.06556v1
PDF	http://arxiv.org/pdf/1607.06556v1.pdf
PWC	https://paperswithcode.com/paper/syntax-based-attention-model-for-natural
Repo
Framework

Investigating the influence of noise and distractors on the interpretation of neural networks


Title	Investigating the influence of noise and distractors on the interpretation of neural networks
Authors	Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, Sven Dähne
Abstract	Understanding neural networks is becoming increasingly important. Over the last few years different types of visualisation and explanation methods have been proposed. However, none of them explicitly considered the behaviour in the presence of noise and distracting elements. In this work, we will show how noise and distracting dimensions can influence the result of an explanation model. This gives a new theoretical insights to aid selection of the most appropriate explanation model within the deep-Taylor decomposition framework.
Tasks
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07270v1
PDF	http://arxiv.org/pdf/1611.07270v1.pdf
PWC	https://paperswithcode.com/paper/investigating-the-influence-of-noise-and
Repo
Framework

Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models


Title	Empirical study of PROXTONE and PROXTONE$^+$ for Fast Learning of Large Scale Sparse Models
Authors	Ziqiang Shi, Rujie Liu
Abstract	PROXTONE is a novel and fast method for optimization of large scale non-smooth convex problem \cite{shi2015large}. In this work, we try to use PROXTONE method in solving large scale \emph{non-smooth non-convex} problems, for example training of sparse deep neural network (sparse DNN) or sparse convolutional neural network (sparse CNN) for embedded or mobile device. PROXTONE converges much faster than first order methods, while first order method is easy in deriving and controlling the sparseness of the solutions. Thus in some applications, in order to train sparse models fast, we propose to combine the merits of both methods, that is we use PROXTONE in the first several epochs to reach the neighborhood of an optimal solution, and then use the first order method to explore the possibility of sparsity in the following training. We call such method PROXTONE plus (PROXTONE$^+$). Both PROXTONE and PROXTONE$^+$ are tested in our experiments, and which demonstrate both methods improved convergence speed twice as fast at least on diverse sparse model learning problems, and at the same time reduce the size to 0.5% for DNN models. The source of all the algorithms is available upon request.
Tasks
Published	2016-04-18
URL	http://arxiv.org/abs/1604.05024v1
PDF	http://arxiv.org/pdf/1604.05024v1.pdf
PWC	https://paperswithcode.com/paper/empirical-study-of-proxtone-and-proxtone-for
Repo
Framework

End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension


Title	End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension
Authors	Yang Yu, Wei Zhang, Kazi Hasan, Mo Yu, Bing Xiang, Bowen Zhou
Abstract	This paper proposes dynamic chunk reader (DCR), an end-to-end neural reading comprehension (RC) model that is able to extract and rank a set of answer candidates from a given document to answer questions. DCR is able to predict answers of variable lengths, whereas previous neural RC models primarily focused on predicting single tokens or entities. DCR encodes a document and an input question with recurrent neural networks, and then applies a word-by-word attention mechanism to acquire question-aware representations for the document, followed by the generation of chunk representations and a ranking module to propose the top-ranked chunk as the answer. Experimental results show that DCR achieves state-of-the-art exact match and F1 scores on the SQuAD dataset.
Tasks	Question Answering, Reading Comprehension
Published	2016-10-31
URL	http://arxiv.org/abs/1610.09996v2
PDF	http://arxiv.org/pdf/1610.09996v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-answer-chunk-extraction-and
Repo
Framework

Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features


Title	Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features
Authors	Attapol T. Rutherford, Vera Demberg, Nianwen Xue
Abstract	Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Surface features achieve good performance, but they are not readily applicable to other languages without semantic lexicons. Previous neural models require parses, surface features, or a small label set to work well. Here, we propose neural network models that are based on feedforward and long-short term memory architecture without any surface features. To our surprise, our best configured feedforward architecture outperforms LSTM-based model in most cases despite thorough tuning. Under various fine-grained label sets and a cross-linguistic setting, our feedforward models perform consistently better or at least just as well as systems that require hand-crafted surface features. Our models present the first neural Chinese discourse parser in the style of Chinese Discourse Treebank, showing that our results hold cross-linguistically.
Tasks	Implicit Discourse Relation Classification, Relation Classification
Published	2016-06-07
URL	http://arxiv.org/abs/1606.01990v1
PDF	http://arxiv.org/pdf/1606.01990v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-models-for-implicit-discourse
Repo
Framework

City-Identification of Flickr Videos Using Semantic Acoustic Features


Title	City-Identification of Flickr Videos Using Semantic Acoustic Features
Authors	Benjamin Elizalde, Guan-Lin Chao, Ming Zeng, Ian Lane
Abstract	City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be made to complement the other modalities. In particular, we present a method to compute and use semantic acoustic features to perform city-identification and the features show semantic evidence of the identification. The semantic evidence is given by a taxonomy of urban sounds and expresses the potential presence of these sounds in the city- soundtracks. We used the MediaEval Placing Task set, which contains Flickr videos labeled by city. In addition, we used the UrbanSound8K set containing audio clips labeled by sound- type. Our method improved the state-of-the-art performance and provides a novel semantic approach to this task
Tasks
Published	2016-07-12
URL	http://arxiv.org/abs/1607.03257v1
PDF	http://arxiv.org/pdf/1607.03257v1.pdf
PWC	https://paperswithcode.com/paper/city-identification-of-flickr-videos-using
Repo
Framework

Combining multiple resolutions into hierarchical representations for kernel-based image classification


Title	Combining multiple resolutions into hierarchical representations for kernel-based image classification
Authors	Yanwei Cui, Sébastien Lefevre, Laetitia Chapel, Anne Puissant
Abstract	Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical representation, where coarser levels are built from a Low Spatial Resolution (LSR) or Medium Spatial Resolution (MSR) image while finer levels are generated from a High Spatial Resolution (HSR) or Very High Spatial Resolution (VHSR) image. Such a representation allows one to benefit from the context information thanks to the coarser levels, and subregions spatial arrangement information thanks to the finer levels. Two dedicated structured kernels are then used to perform machine learning directly on the constructed hierarchical representation. This strategy overcomes the limits of conventional GEOBIA classification procedures that can handle only one or very few pre-selected scales. Experiments run on an urban classification task show that the proposed approach can highly improve the classification accuracy w.r.t. conventional approaches working on a single scale.
Tasks	Image Classification
Published	2016-07-09
URL	http://arxiv.org/abs/1607.02654v2
PDF	http://arxiv.org/pdf/1607.02654v2.pdf
PWC	https://paperswithcode.com/paper/combining-multiple-resolutions-into
Repo
Framework

Long-term Planning by Short-term Prediction


Title	Long-term Planning by Short-term Prediction
Authors	Shai Shalev-Shwartz, Nir Ben-Zrihem, Aviad Cohen, Amnon Shashua
Abstract	We consider planning problems, that often arise in autonomous driving applications, in which an agent should decide on immediate actions so as to optimize a long term objective. For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge. Such problems are characterized by continuous state and action spaces, and by interaction with multiple agents, whose behavior can be adversarial. We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces. We propose to tackle the planning task by decomposing the problem into two phases: First, we apply supervised learning for predicting the near future based on the present. We require that the predictor will be differentiable with respect to the representation of the present. Second, we model a full trajectory of the agent using a recurrent neural network, where unexplained factors are modeled as (additive) input nodes. This allows us to solve the long-term planning problem using supervised learning techniques and direct optimization over the recurrent neural network. Our approach enables us to learn robust policies by incorporating adversarial elements to the environment.
Tasks	Autonomous Driving
Published	2016-02-04
URL	http://arxiv.org/abs/1602.01580v1
PDF	http://arxiv.org/pdf/1602.01580v1.pdf
PWC	https://paperswithcode.com/paper/long-term-planning-by-short-term-prediction
Repo
Framework

Spatio-temporal Gaussian processes modeling of dynamical systems in systems biology


Title	Spatio-temporal Gaussian processes modeling of dynamical systems in systems biology
Authors	Mu Niu, Zhenwen Dai, Neil Lawrence, Kolja Becker
Abstract	Quantitative modeling of post-transcriptional regulation process is a challenging problem in systems biology. A mechanical model of the regulatory process needs to be able to describe the available spatio-temporal protein concentration and mRNA expression data and recover the continuous spatio-temporal fields. Rigorous methods are required to identify model parameters. A promising approach to deal with these difficulties is proposed using Gaussian process as a prior distribution over the latent function of protein concentration and mRNA expression. In this study, we consider a partial differential equation mechanical model with differential operators and latent function. Since the operators at stake are linear, the information from the physical model can be encoded into the kernel function. Hybrid Monte Carlo methods are employed to carry out Bayesian inference of the partial differential equation parameters and Gaussian process kernel parameters. The spatio-temporal field of protein concentration and mRNA expression are reconstructed without explicitly solving the partial differential equation.
Tasks	Bayesian Inference, Gaussian Processes
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05163v1
PDF	http://arxiv.org/pdf/1610.05163v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-gaussian-processes-modeling
Repo
Framework

Outlier absorbing based on a Bayesian approach


Title	Outlier absorbing based on a Bayesian approach
Authors	Parsa Bagherzadeh, Hadi Sadoghi Yazdi
Abstract	The presence of outliers is prevalent in machine learning applications and may produce misleading results. In this paper a new method for dealing with outliers and anomal samples is proposed. To overcome the outlier issue, the proposed method combines the global and local views of the samples. By combination of these views, our algorithm performs in a robust manner. The experimental results show the capabilities of the proposed method.
Tasks
Published	2016-07-02
URL	http://arxiv.org/abs/1607.00466v1
PDF	http://arxiv.org/pdf/1607.00466v1.pdf
PWC	https://paperswithcode.com/paper/outlier-absorbing-based-on-a-bayesian
Repo
Framework

Continuation semantics for multi-quantifier sentences: operation-based approaches


Title	Continuation semantics for multi-quantifier sentences: operation-based approaches
Authors	Justyna Grudzinska, Marek Zawadowski
Abstract	Classical scope-assignment strategies for multi-quantifier sentences involve quantifier phrase (QP)-movement. More recent continuation-based approaches provide a compelling alternative, for they interpret QP’s in situ - without resorting to Logical Forms or any structures beyond the overt syntax. The continuation-based strategies can be divided into two groups: those that locate the source of scope-ambiguity in the rules of semantic composition and those that attribute it to the lexical entries for the quantifier words. In this paper, we focus on the former operation-based approaches and the nature of the semantic operations involved. More specifically, we discuss three such possible operation-based strategies for multi-quantifier sentences, together with their relative merits and costs.
Tasks	Semantic Composition
Published	2016-07-31
URL	http://arxiv.org/abs/1608.00255v2
PDF	http://arxiv.org/pdf/1608.00255v2.pdf
PWC	https://paperswithcode.com/paper/continuation-semantics-for-multi-quantifier
Repo
Framework

Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator


Title	Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator
Authors	Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris Kitani, Takeo Kanade
Abstract	We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations. Given a single image and auxiliary scene information in the form of camera parameters and geometric layout of the scene, the Visual Compiler first infers geometrically and photometrically accurate images of humans in that scene through the use of computer graphics rendering. Using these renders we learn a scene-and-region specific spatially-varying fully convolutional neural network, for simultaneous detection, pose estimation and segmentation of pedestrians. We demonstrate that when real human annotated data is scarce or non-existent, our data generation strategy can provide an excellent solution for bootstrapping human detection and pose estimation. Experimental results show that our approach outperforms off-the-shelf state-of-the-art pedestrian detectors and pose estimators that are trained on real data.
Tasks	Human Detection, Pose Estimation
Published	2016-12-15
URL	http://arxiv.org/abs/1612.05234v1
PDF	http://arxiv.org/pdf/1612.05234v1.pdf
PWC	https://paperswithcode.com/paper/visual-compiler-synthesizing-a-scene-specific
Repo
Framework

Learning to Hash-tag Videos with Tag2Vec


Title	Learning to Hash-tag Videos with Tag2Vec
Authors	Aditya Singh, Saurabh Saini, Rajvi Shah, PJ Narayanan
Abstract	User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash-tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data-driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.
Tasks
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04061v1
PDF	http://arxiv.org/pdf/1612.04061v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-hash-tag-videos-with-tag2vec
Repo
Framework

Maximum entropy models for generation of expressive music


Title	Maximum entropy models for generation of expressive music
Authors	Simon Moulieras, François Pachet
Abstract	In the context of contemporary monophonic music, expression can be seen as the difference between a musical performance and its symbolic representation, i.e. a musical score. In this paper, we show how Maximum Entropy (MaxEnt) models can be used to generate musical expression in order to mimic a human performance. As a training corpus, we had a professional pianist play about 150 melodies of jazz, pop, and latin jazz. The results show a good predictive power, validating the choice of our model. Additionally, we set up a listening test whose results reveal that on average, people significantly prefer the melodies generated by the MaxEnt model than the ones without any expression, or with fully random expression. Furthermore, in some cases, MaxEnt melodies are almost as popular as the human performed ones.
Tasks
Published	2016-10-12
URL	http://arxiv.org/abs/1610.03606v1
PDF	http://arxiv.org/pdf/1610.03606v1.pdf
PWC	https://paperswithcode.com/paper/maximum-entropy-models-for-generation-of
Repo
Framework