October 17, 2019

3358 words 16 mins read

Paper Group ANR 716

High SNR Consistent Compressive Sensing Without Signal and Noise Statistics. Analyzing and Interpreting Convolutional Neural Networks in NLP. DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking. Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN. Underwater Fish Species Classification using Convol …

High SNR Consistent Compressive Sensing Without Signal and Noise Statistics


Title	High SNR Consistent Compressive Sensing Without Signal and Noise Statistics
Authors	Sreejith Kallummil, Sheetal Kalyani
Abstract	Recovering the support of sparse vectors in underdetermined linear regression models, \textit{aka}, compressive sensing is important in many signal processing applications. High SNR consistency (HSC), i.e., the ability of a support recovery technique to correctly identify the support with increasing signal to noise ratio (SNR) is an increasingly popular criterion to qualify the high SNR optimality of support recovery techniques. The HSC results available in literature for support recovery techniques applicable to underdetermined linear regression models like least absolute shrinkage and selection operator (LASSO), orthogonal matching pursuit (OMP) etc. assume \textit{a priori} knowledge of noise variance or signal sparsity. However, both these parameters are unavailable in most practical applications. Further, it is extremely difficult to estimate noise variance or signal sparsity in underdetermined regression models. This limits the utility of existing HSC results. In this article, we propose two techniques, \textit{viz.}, residual ratio minimization (RRM) and residual ratio thresholding with adaptation (RRTA) to operate OMP algorithm without the \textit{a priroi} knowledge of noise variance and signal sparsity and establish their HSC analytically and numerically. To the best of our knowledge, these are the first and only noise statistics oblivious algorithms to report HSC in underdetermined regression models.
Tasks	Compressive Sensing
Published	2018-11-17
URL	http://arxiv.org/abs/1811.07131v1
PDF	http://arxiv.org/pdf/1811.07131v1.pdf
PWC	https://paperswithcode.com/paper/high-snr-consistent-compressive-sensing-1
Repo
Framework

Analyzing and Interpreting Convolutional Neural Networks in NLP


Title	Analyzing and Interpreting Convolutional Neural Networks in NLP
Authors	Mahnaz Koupaee, William Yang Wang
Abstract	Convolutional neural networks have been successfully applied to various NLP tasks. However, it is not obvious whether they model different linguistic patterns such as negation, intensification, and clause compositionality to help the decision-making process. In this paper, we apply visualization techniques to observe how the model can capture different linguistic features and how these features can affect the performance of the model. Later on, we try to identify the model errors and their sources. We believe that interpreting CNNs is the first step to understand the underlying semantic features which can raise awareness to further improve the performance and explainability of CNN models.
Tasks	Decision Making
Published	2018-10-18
URL	http://arxiv.org/abs/1810.09312v1
PDF	http://arxiv.org/pdf/1810.09312v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-and-interpreting-convolutional
Repo
Framework

DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking


Title	DSNet: Deep and Shallow Feature Learning for Efficient Visual Tracking
Authors	Qiangqiang Wu, Yan Yan, Yanjie Liang, Yi Liu, Hanzi Wang
Abstract	In recent years, Discriminative Correlation Filter (DCF) based tracking methods have achieved great success in visual tracking. However, the multi-resolution convolutional feature maps trained from other tasks like image classification, cannot be naturally used in the conventional DCF formulation. Furthermore, these high-dimensional feature maps significantly increase the tracking complexity and thus limit the tracking speed. In this paper, we present a deep and shallow feature learning network, namely DSNet, to learn the multi-level same-resolution compressed (MSC) features for efficient online tracking, in an end-to-end offline manner. Specifically, the proposed DSNet compresses multi-level convolutional features to uniform spatial resolution features. The learned MSC features effectively encode both appearance and semantic information of objects in the same-resolution feature maps, thus enabling an elegant combination of the MSC features with any DCF-based methods. Additionally, a channel reliability measurement (CRM) method is presented to further refine the learned MSC features. We demonstrate the effectiveness of the MSC features learned from the proposed DSNet on two DCF tracking frameworks: the basic DCF framework and the continuous convolution operator framework. Extensive experiments show that the learned MSC features have the appealing advantage of allowing the equipped DCF-based tracking methods to perform favorably against the state-of-the-art methods while running at high frame rates.
Tasks	Image Classification, Visual Tracking
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02208v1
PDF	http://arxiv.org/pdf/1811.02208v1.pdf
PWC	https://paperswithcode.com/paper/dsnet-deep-and-shallow-feature-learning-for
Repo
Framework

Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN


Title	Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN
Authors	Yingqi Qu, Jie Liu, Liangyi Kang, Qinfeng Shi, Dan Ye
Abstract	With the rapid growth of knowledge bases (KBs), question answering over knowledge base, a.k.a. KBQA has drawn huge attention in recent years. Most of the existing KBQA methods follow so called encoder-compare framework. They map the question and the KB facts to a common embedding space, in which the similarity between the question vector and the fact vectors can be conveniently computed. This, however, inevitably loses original words interaction information. To preserve more original information, we propose an attentive recurrent neural network with similarity matrix based convolutional neural network (AR-SMCNN) model, which is able to capture comprehensive hierarchical information utilizing the advantages of both RNN and CNN. We use RNN to capture semantic-level correlation by its sequential modeling nature, and use an attention mechanism to keep track of the entities and relations simultaneously. Meanwhile, we use a similarity matrix based CNN with two-directions pooling to extract literal-level words interaction matching utilizing CNNs strength of modeling spatial correlation among data. Moreover, we have developed a new heuristic extension method for entity detection, which significantly decreases the effect of noise. Our method has outperformed the state-of-the-arts on SimpleQuestion benchmark in both accuracy and efficiency.
Tasks	Question Answering
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03317v3
PDF	http://arxiv.org/pdf/1804.03317v3.pdf
PWC	https://paperswithcode.com/paper/question-answering-over-freebase-via
Repo
Framework

Underwater Fish Species Classification using Convolutional Neural Network and Deep Learning


Title	Underwater Fish Species Classification using Convolutional Neural Network and Deep Learning
Authors	Dhruv Rathi, Sushant Jain, Dr. S. Indu
Abstract	The target of this paper is to recommend a way for Automated classification of Fish species. A high accuracy fish classification is required for greater understanding of fish behavior in Ichthyology and by marine biologists. Maintaining a ledger of the number of fishes per species and marking the endangered species in large and small water bodies is required by concerned institutions. Majority of available methods focus on classification of fishes outside of water because underwater classification poses challenges such as background noises, distortion of images, the presence of other water bodies in images, image quality and occlusion. This method uses a novel technique based on Convolutional Neural Networks, Deep Learning and Image Processing to achieve an accuracy of 96.29%. This method ensures considerably discrimination accuracy improvements than the previously proposed methods.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10106v1
PDF	http://arxiv.org/pdf/1805.10106v1.pdf
PWC	https://paperswithcode.com/paper/underwater-fish-species-classification-using
Repo
Framework

Identifying Relationships Among Sentences in Court Case Transcripts Using Discourse Relations


Title	Identifying Relationships Among Sentences in Court Case Transcripts Using Discourse Relations
Authors	Gathika Ratnayaka, Thejan Rupasinghe, Nisansa de Silva, Menuka Warushavithana, Viraj Gamage, Amal Shehan Perera
Abstract	Case Law has a significant impact on the proceedings of legal cases. Therefore, the information that can be obtained from previous court cases is valuable to lawyers and other legal officials when performing their duties. This paper describes a methodology of applying discourse relations between sentences when processing text documents related to the legal domain. In this study, we developed a mechanism to classify the relationships that can be observed among sentences in transcripts of United States court cases. First, we defined relationship types that can be observed between sentences in court case transcripts. Then we classified pairs of sentences according to the relationship type by combining a machine learning model and a rule-based approach. The results obtained through our system were evaluated using human judges. To the best of our knowledge, this is the first study where discourse relationships between sentences have been used to determine relationships among sentences in legal court case transcripts.
Tasks
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03416v2
PDF	http://arxiv.org/pdf/1809.03416v2.pdf
PWC	https://paperswithcode.com/paper/identifying-relationships-among-sentences-in
Repo
Framework

Advancing Connectionist Temporal Classification With Attention Modeling


Title	Advancing Connectionist Temporal Classification With Attention Modeling
Authors	Amit Das, Jinyu Li, Rui Zhao, Yifan Gong
Abstract	In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components. We evaluate our system on a 3400 hours Microsoft Cortana voice assistant task and demonstrate that our proposed model consistently outperforms the baseline model achieving about 20% relative reduction in word error rates.
Tasks	Language Modelling, Speech Recognition
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05563v1
PDF	http://arxiv.org/pdf/1803.05563v1.pdf
PWC	https://paperswithcode.com/paper/advancing-connectionist-temporal
Repo
Framework

Integrating Flexible Normalization into Mid-Level Representations of Deep Convolutional Neural Networks


Title	Integrating Flexible Normalization into Mid-Level Representations of Deep Convolutional Neural Networks
Authors	Luis Gonzalo Sanchez Giraldo, Odelia Schwartz
Abstract	Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to mid-level representations of deep CNNs as a tractable way to study contextual normalization mechanisms in mid-level cortical areas. This approach captures non-trivial spatial dependencies among mid-level features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high order features, geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in mid-level cortical areas. We also expect this approach to be useful as part of the CNN toolkit, therefore going beyond more restrictive fixed forms of normalization.
Tasks
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01823v3
PDF	http://arxiv.org/pdf/1806.01823v3.pdf
PWC	https://paperswithcode.com/paper/integrating-flexible-normalization-into-mid
Repo
Framework

Gradient-Coherent Strong Regularization for Deep Neural Networks


Title	Gradient-Coherent Strong Regularization for Deep Neural Networks
Authors	Dae Hoon Park, Chiu Man Ho, Yi Chang, Huaqing Zhang
Abstract	Regularization plays an important role in generalization of deep neural networks, which are often prone to overfitting with their numerous parameters. L1 and L2 regularizers are common regularization tools in machine learning with their simplicity and effectiveness. However, we observe that imposing strong L1 or L2 regularization with stochastic gradient descent on deep neural networks easily fails, which limits the generalization ability of the underlying neural networks. To understand this phenomenon, we first investigate how and why learning fails when strong regularization is imposed on deep neural networks. We then propose a novel method, gradient-coherent strong regularization, which imposes regularization only when the gradients are kept coherent in the presence of strong regularization. Experiments are performed with multiple deep architectures on three benchmark data sets for image recognition. Experimental results show that our proposed approach indeed endures strong regularization and significantly improves both accuracy and compression (up to 9.9x), which could not be achieved otherwise.
Tasks	L2 Regularization
Published	2018-11-20
URL	https://arxiv.org/abs/1811.08056v2
PDF	https://arxiv.org/pdf/1811.08056v2.pdf
PWC	https://paperswithcode.com/paper/gradient-coherent-strong-regularization-for
Repo
Framework

Multi-layer Pruning Framework for Compressing Single Shot MultiBox Detector


Title	Multi-layer Pruning Framework for Compressing Single Shot MultiBox Detector
Authors	Pravendra Singh, Manikandan R, Neeraj Matiyali, Vinay P. Namboodiri
Abstract	We propose a framework for compressing state-of-the-art Single Shot MultiBox Detector (SSD). The framework addresses compression in the following stages: Sparsity Induction, Filter Selection, and Filter Pruning. In the Sparsity Induction stage, the object detector model is sparsified via an improved global threshold. In Filter Selection & Pruning stage, we select and remove filters using sparsity statistics of filter weights in two consecutive convolutional layers. This results in the model with the size smaller than most existing compact architectures. We evaluate the performance of our framework with multiple datasets and compare over multiple methods. Experimental results show that our method achieves state-of-the-art compression of 6.7X and 4.9X on PASCAL VOC dataset on models SSD300 and SSD512 respectively. We further show that the method produces maximum compression of 26X with SSD512 on German Traffic Sign Detection Benchmark (GTSDB). Additionally, we also empirically show our method’s adaptability for classification based architecture VGG16 on datasets CIFAR and German Traffic Sign Recognition Benchmark (GTSRB) achieving a compression rate of 125X and 200X with the reduction in flops by 90.50% and 96.6% respectively with no loss of accuracy. In addition to this, our method does not require any special libraries or hardware support for the resulting compressed models.
Tasks	Traffic Sign Recognition
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08342v1
PDF	http://arxiv.org/pdf/1811.08342v1.pdf
PWC	https://paperswithcode.com/paper/multi-layer-pruning-framework-for-compressing
Repo
Framework

Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation


Title	Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation
Authors	Pratyusha Sharma, Lekha Mohan, Lerrel Pinto, Abhinav Gupta
Abstract	In recent years, we have seen an emergence of data-driven approaches in robotics. However, most existing efforts and datasets are either in simulation or focus on a single task in isolation such as grasping, pushing or poking. In order to make progress and capture the space of manipulation, we would need to collect a large-scale dataset of diverse tasks such as pouring, opening bottles, stacking objects etc. But how does one collect such a dataset? In this paper, we present the largest available robotic-demonstration dataset (MIME) that contains 8260 human-robot demonstrations over 20 different robotic tasks (https://sites.google.com/view/mimedataset). These tasks range from the simple task of pushing objects to the difficult task of stacking household objects. Our dataset consists of videos of human demonstrations and kinesthetic trajectories of robot demonstrations. We also propose to use this dataset for the task of mapping 3rd person video features to robot trajectories. Furthermore, we present two different approaches using this dataset and evaluate the predicted robot trajectories against ground-truth trajectories. We hope our dataset inspires research in multiple areas including visual imitation, trajectory prediction, and multi-task robotic learning.
Tasks	Trajectory Prediction
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07121v1
PDF	http://arxiv.org/pdf/1810.07121v1.pdf
PWC	https://paperswithcode.com/paper/multiple-interactions-made-easy-mime-large
Repo
Framework

Building a Telescope to Look Into High-Dimensional Image Spaces


Title	Building a Telescope to Look Into High-Dimensional Image Spaces
Authors	Mitch Hill, Erik Nijkamp, Song-Chun Zhu
Abstract	An image pattern can be represented by a probability distribution whose density is concentrated on different low-dimensional subspaces in the high-dimensional image space. Such probability densities have an astronomical number of local modes corresponding to typical pattern appearances. Related groups of modes can join to form macroscopic image basins that represent pattern concepts. Recent works use neural networks that capture high-order image statistics to learn Gibbs models capable of synthesizing realistic images of many patterns. However, characterizing a learned probability density to uncover the Hopfield memories of the model, encoded by the structure of the local modes, remains an open challenge. In this work, we present novel computational experiments that map and visualize the local mode structure of Gibbs densities. Efficient mapping requires identifying the global basins without enumerating the countless modes. Inspired by Grenander’s jump-diffusion method, we propose a new MCMC tool called Attraction-Diffusion (AD) that can capture the macroscopic structure of highly non-convex densities by measuring metastability of local modes. AD involves altering the target density with a magnetization potential penalizing distance from a known mode and running an MCMC sample of the altered density to measure the stability of the initial chain state. Using a low-dimensional generator network to facilitate exploration, we map image spaces with up to 12,288 dimensions (64 $\times$ 64 pixels in RGB). Our work shows: (1) AD can efficiently map highly non-convex probability densities, (2) metastable regions of pattern probability densities contain coherent groups of images, and (3) the perceptibility of differences between training images influences the metastability of image basins.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.01043v1
PDF	http://arxiv.org/pdf/1803.01043v1.pdf
PWC	https://paperswithcode.com/paper/building-a-telescope-to-look-into-high
Repo
Framework

NeuroNet: Fast and Robust Reproduction of Multiple Brain Image Segmentation Pipelines


Title	NeuroNet: Fast and Robust Reproduction of Multiple Brain Image Segmentation Pipelines
Authors	Martin Rajchl, Nick Pawlowski, Daniel Rueckert, Paul M. Matthews, Ben Glocker
Abstract	NeuroNet is a deep convolutional neural network mimicking multiple popular and state-of-the-art brain segmentation tools including FSL, SPM, and MALPEM. The network is trained on 5,000 T1-weighted brain MRI scans from the UK Biobank Imaging Study that have been automatically segmented into brain tissue and cortical and sub-cortical structures using the standard neuroimaging pipelines. Training a single model from these complementary and partially overlapping label maps yields a new powerful “all-in-one”, multi-output segmentation tool. The processing time for a single subject is reduced by an order of magnitude compared to running each individual software package. We demonstrate very good reproducibility of the original outputs while increasing robustness to variations in the input data. We believe NeuroNet could be an important tool in large-scale population imaging studies and serve as a new standard in neuroscience by reducing the risk of introducing bias when choosing a specific software package.
Tasks	Brain Image Segmentation, Brain Segmentation, Semantic Segmentation
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04224v1
PDF	http://arxiv.org/pdf/1806.04224v1.pdf
PWC	https://paperswithcode.com/paper/neuronet-fast-and-robust-reproduction-of
Repo
Framework

Domain-Invariant Adversarial Learning for Unsupervised Domain Adaption


Title	Domain-Invariant Adversarial Learning for Unsupervised Domain Adaption
Authors	Yexun Zhang, Ya Zhang, Yanfeng Wang, Qi Tian
Abstract	Unsupervised domain adaption aims to learn a powerful classifier for the target domain given a labeled source data set and an unlabeled target data set. To alleviate the effect of `domain shift’, the major challenge in domain adaptation, studies have attempted to align the distributions of the two domains. Recent research has suggested that generative adversarial network (GAN) has the capability of implicitly capturing data distribution. In this paper, we thus propose a simple but effective model for unsupervised domain adaption leveraging adversarial learning. The same encoder is shared between the source and target domains which is expected to extract domain-invariant representations with the help of an adversarial discriminator. With the labeled source data, we introduce the center loss to increase the discriminative power of feature learned. We further align the conditional distribution of the two domains to enforce the discrimination of the features in the target domain. Unlike previous studies where the source features are extracted with a fixed pre-trained encoder, our method jointly learns feature representations of two domains. Moreover, by sharing the encoder, the model does not need to know the source of images during testing and hence is more widely applicable. We evaluate the proposed method on several unsupervised domain adaption benchmarks and achieve superior or comparable performance to state-of-the-art results. \|
Tasks	Domain Adaptation
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12751v1
PDF	http://arxiv.org/pdf/1811.12751v1.pdf
PWC	https://paperswithcode.com/paper/domain-invariant-adversarial-learning-for
Repo
Framework

Path-Invariant Map Networks


Title	Path-Invariant Map Networks
Authors	Zaiwei Zhang, Zhenxiao Liang, Lemeng Wu, Xiaowei Zhou, Qixing Huang
Abstract	Optimizing a network of maps among a collection of objects/domains (or map synchronization) is a central problem across computer vision and many other relevant fields. Compared to optimizing pairwise maps in isolation, the benefit of map synchronization is that there are natural constraints among a map network that can improve the quality of individual maps. While such self-supervision constraints are well-understood for undirected map networks (e.g., the cycle-consistency constraint), they are under-explored for directed map networks, which naturally arise when maps are given by parametric maps (e.g., a feed-forward neural network). In this paper, we study a natural self-supervision constraint for directed map networks called path-invariance, which enforces that composite maps along different paths between a fixed pair of source and target domains are identical. We introduce path-invariance bases for efficient encoding of the path-invariance constraint and present an algorithm that outputs a path-variance basis with polynomial time and space complexities. We demonstrate the effectiveness of our approach on optimizing object correspondences, estimating dense image maps via neural networks, and semantic segmentation of 3D scenes via map networks of diverse 3D representations. In particular, for 3D semantic segmentation, our approach only requires 8% labeled data from ScanNet to achieve the same performance as training a single 3D segmentation network with 30% to 100% labeled data.
Tasks	3D Semantic Segmentation, Scene Segmentation, Semantic Segmentation
Published	2018-12-31
URL	https://arxiv.org/abs/1812.11647v3
PDF	https://arxiv.org/pdf/1812.11647v3.pdf
PWC	https://paperswithcode.com/paper/path-invariant-map-networks
Repo
Framework