July 28, 2019

3149 words 15 mins read

Paper Group ANR 271

Graph Distillation for Action Detection with Privileged Modalities. Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging. AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus. Action Classification and Highlighting in Videos. Lexical-semantic resources: yet powerful resources fo …

Graph Distillation for Action Detection with Privileged Modalities


Title	Graph Distillation for Action Detection with Privileged Modalities
Authors	Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei
Abstract	We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on a single domain or task and does not handle the modality discrepancy between training and testing. In this work, we propose a method termed graph distillation that incorporates rich privileged information from a large-scale multimodal dataset in the source domain, and improves the learning in the target domain where training data and modalities are scarce. We evaluate our approach on action classification and detection tasks in multimodal videos, and show that our model outperforms the state-of-the-art by a large margin on the NTU RGB+D and PKU-MMD benchmarks. The code is released at http://alan.vision/eccv18_graph/.
Tasks	Action Classification, Action Detection, Transfer Learning
Published	2017-11-30
URL	http://arxiv.org/abs/1712.00108v2
PDF	http://arxiv.org/pdf/1712.00108v2.pdf
PWC	https://paperswithcode.com/paper/graph-distillation-for-action-detection-with
Repo
Framework

Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging


Title	Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging
Authors	Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, Ali Gholipour
Abstract	Brain extraction or whole brain segmentation is an important first step in many of the neuroimage analysis pipelines. The accuracy and robustness of brain extraction, therefore, is crucial for the accuracy of the entire brain analysis process. With the aim of designing a learning-based, geometry-independent and registration-free brain extraction tool in this study, we present a technique based on an auto-context convolutional neural network (CNN), in which intrinsic local and global image features are learned through 2D patches of different window sizes. In this architecture three parallel 2D convolutional pathways for three different directions (axial, coronal, and sagittal) implicitly learn 3D image information without the need for computationally expensive 3D convolutions. Posterior probability maps generated by the network are used iteratively as context information along with the original image patches to learn the local shape and connectedness of the brain, to extract it from non-brain tissue. The brain extraction results we have obtained from our algorithm are superior to the recently reported results in the literature on two publicly available benchmark datasets, namely LPBA40 and OASIS, in which we obtained Dice overlap coefficients of 97.42% and 95.40%, respectively. Furthermore, we evaluated the performance of our algorithm in the challenging problem of extracting arbitrarily-oriented fetal brains in reconstructed fetal brain magnetic resonance imaging (MRI) datasets. In this application our algorithm performed much better than the other methods (Dice coefficient: 95.98%), where the other methods performed poorly due to the non-standard orientation and geometry of the fetal brain in MRI. Our CNN-based method can provide accurate, geometry-independent brain extraction in challenging applications.
Tasks	Brain Segmentation
Published	2017-03-06
URL	http://arxiv.org/abs/1703.02083v2
PDF	http://arxiv.org/pdf/1703.02083v2.pdf
PWC	https://paperswithcode.com/paper/auto-context-convolutional-neural-network
Repo
Framework

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus


Title	AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus
Authors	Willie Boag, Hassan Kané
Abstract	In recent years, word embeddings have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of words. Clinical natural language processing datasets, however, tend to be much smaller. Even the largest publicly-available dataset of medical notes is three orders of magnitude smaller than the dataset of the oft-used “Google News” word vectors. In order to make up for limited training data sizes, we encode expert domain knowledge into our embeddings. Building on a previous extension of word2vec, we show that generalizing the notion of a word’s “context” to include arbitrary features creates an avenue for encoding domain knowledge into word embeddings. We show that the word vectors produced by this method outperform their text-only counterparts across the board in correlation with clinical experts.
Tasks	Word Embeddings
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01460v1
PDF	http://arxiv.org/pdf/1712.01460v1.pdf
PWC	https://paperswithcode.com/paper/awe-cm-vectors-augmenting-word-embeddings
Repo
Framework

Action Classification and Highlighting in Videos


Title	Action Classification and Highlighting in Videos
Authors	Atousa Torabi, Leonid Sigal
Abstract	Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition. Our model jointly learns to classify actions and highlight frames associated with the action, by attending to salient visual information through a jointly learned soft-attention networks. We explore attention informed by various forms of visual semantic features, including those encoding actions, objects and scenes. We qualitatively show that soft-attention can learn to effectively attend to important objects and scene information correlated with specific human actions. Further, we show that, quantitatively, our attention-based LSTM outperforms the vanilla LSTM and CNN models used by stateof-the-art methods. On a large-scale youtube video dataset, ActivityNet, our model outperforms competing methods in action classification.
Tasks	Action Classification, Activity Recognition, Human Activity Recognition, Machine Translation
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09522v1
PDF	http://arxiv.org/pdf/1708.09522v1.pdf
PWC	https://paperswithcode.com/paper/action-classification-and-highlighting-in
Repo
Framework

Lexical-semantic resources: yet powerful resources for automatic personality classification


Title	Lexical-semantic resources: yet powerful resources for automatic personality classification
Authors	Xuan-Son Vu, Lucie Flekova, Lili Jiang, Iryna Gurevych
Abstract	In this paper, we aim to reveal the impact of lexical-semantic resources, used in particular for word sense disambiguation and sense-level semantic categorization, on automatic personality classification task. While stylistic features (e.g., part-of-speech counts) have been shown their power in this task, the impact of semantics beyond targeted word lists is relatively unexplored. We propose and extract three types of lexical-semantic features, which capture high-level concepts and emotions, overcoming the lexical gap of word n-grams. Our experimental results are comparable to state-of-the-art methods, while no personality-specific resources are required.
Tasks	Word Sense Disambiguation
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09824v1
PDF	http://arxiv.org/pdf/1711.09824v1.pdf
PWC	https://paperswithcode.com/paper/lexical-semantic-resources-yet-powerful
Repo
Framework

Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion


Title	Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion
Authors	Rushil Anirudh, Hyojin Kim, Jayaraman J. Thiagarajan, K. Aditya Mohan, Kyle Champley, Timo Bremer
Abstract	Computed Tomography (CT) reconstruction is a fundamental component to a wide variety of applications ranging from security, to healthcare. The classical techniques require measuring projections, called sinograms, from a full 180$^\circ$ view of the object. This is impractical in a limited angle scenario, when the viewing angle is less than 180$^\circ$, which can occur due to different factors including restrictions on scanning time, limited flexibility of scanner rotation, etc. The sinograms obtained as a result, cause existing techniques to produce highly artifact-laden reconstructions. In this paper, we propose to address this problem through implicit sinogram completion, on a challenging real world dataset containing scans of common checked-in luggage. We propose a system, consisting of 1D and 2D convolutional neural networks, that operates on a limited angle sinogram to directly produce the best estimate of a reconstruction. Next, we use the x-ray transform on this reconstruction to obtain a “completed” sinogram, as if it came from a full 180$^\circ$ measurement. We feed this to standard analytical and iterative reconstruction techniques to obtain the final reconstruction. We show with extensive experimentation that this combined strategy outperforms many competitive baselines. We also propose a measure of confidence for the reconstruction that enables a practitioner to gauge the reliability of a prediction made by our network. We show that this measure is a strong indicator of quality as measured by the PSNR, while not requiring ground truth at test time. Finally, using a segmentation experiment, we show that our reconstruction preserves the 3D structure of objects effectively.
Tasks	Computed Tomography (CT)
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10388v3
PDF	http://arxiv.org/pdf/1711.10388v3.pdf
PWC	https://paperswithcode.com/paper/lose-the-views-limited-angle-ct
Repo
Framework

CINet: A Learning Based Approach to Incremental Context Modeling in Robots


Title	CINet: A Learning Based Approach to Incremental Context Modeling in Robots
Authors	Fethiye Irmak Doğan, İlker Bozcan, Mehmet Çelik, Sinan Kalkan
Abstract	There have been several attempts at modeling context in robots. However, either these attempts assume a fixed number of contexts or use a rule-based approach to determine when to increment the number of contexts. In this paper, we pose the task of when to increment as a learning problem, which we solve using a Recurrent Neural Network. We show that the network successfully (with 98% testing accuracy) learns to predict when to increment, and demonstrate, in a scene modeling problem (where the correct number of contexts is not known), that the robot increments the number of contexts in an expected manner (i.e., the entropy of the system is reduced). We also present how the incremental model can be used for various scene reasoning tasks.
Tasks
Published	2017-10-13
URL	http://arxiv.org/abs/1710.04981v3
PDF	http://arxiv.org/pdf/1710.04981v3.pdf
PWC	https://paperswithcode.com/paper/cinet-a-learning-based-approach-to
Repo
Framework

Deep linear neural networks with arbitrary loss: All local minima are global


Title	Deep linear neural networks with arbitrary loss: All local minima are global
Authors	Thomas Laurent, James von Brecht
Abstract	We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.
Tasks
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01473v2
PDF	http://arxiv.org/pdf/1712.01473v2.pdf
PWC	https://paperswithcode.com/paper/deep-linear-neural-networks-with-arbitrary
Repo
Framework

Compression of Deep Neural Networks for Image Instance Retrieval


Title	Compression of Deep Neural Networks for Image Instance Retrieval
Authors	Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Lingyu Duan, Tomaso Poggio
Abstract	Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds of megabytes of storage making them inconvenient to deploy in mobile applications or in custom hardware. In this work, we study the problem of neural network model compression focusing on the image instance retrieval task. We study quantization, coding, pruning and weight sharing techniques for reducing model size for the instance retrieval problem. We provide extensive experimental results on the trade-off between retrieval performance and model size for different types of networks on several data sets providing the most comprehensive study on this topic. We compress models to the order of a few MBs: two orders of magnitude smaller than the uncompressed models while achieving negligible loss in retrieval performance.
Tasks	Image Instance Retrieval, Model Compression, Quantization
Published	2017-01-18
URL	http://arxiv.org/abs/1701.04923v1
PDF	http://arxiv.org/pdf/1701.04923v1.pdf
PWC	https://paperswithcode.com/paper/compression-of-deep-neural-networks-for-image
Repo
Framework

Annotating and Modeling Empathy in Spoken Conversations


Title	Annotating and Modeling Empathy in Spoken Conversations
Authors	Firoj Alam, Morena Danieli, Giuseppe Riccardi
Abstract	Empathy, as defined in behavioral sciences, expresses the ability of human beings to recognize, understand and react to emotions, attitudes and beliefs of others. The lack of an operational definition of empathy makes it difficult to measure it. In this paper, we address two related problems in automatic affective behavior analysis: the design of the annotation protocol and the automatic recognition of empathy from spoken conversations. We propose and evaluate an annotation scheme for empathy inspired by the modal model of emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic spoken conversations. In the context of behavioral analysis, we designed an automatic segmentation and classification system for empathy. Given the different speech and language levels of representation where empathy may be communicated, we investigated features derived from the lexical and acoustic spaces. The feature development process was designed to support both the fusion and automatic selection of relevant features from high dimensional space. The automatic classification system was evaluated on call center conversations where it showed significantly better performance than the baseline.
Tasks
Published	2017-05-13
URL	http://arxiv.org/abs/1705.04839v3
PDF	http://arxiv.org/pdf/1705.04839v3.pdf
PWC	https://paperswithcode.com/paper/annotating-and-modeling-empathy-in-spoken
Repo
Framework

Spotting the Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization


Title	Spotting the Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization
Authors	Joel Brogan, Paolo Bestagini, Aparna Bharati, Allan Pinto, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, Walter Scheirer
Abstract	As image tampering becomes ever more sophisticated and commonplace, the need for image forensics algorithms that can accurately and quickly detect forgeries grows. In this paper, we revisit the ideas of image querying and retrieval to provide clues to better localize forgeries. We propose a method to perform large-scale image forensics on the order of one million images using the help of an image search algorithm and database to gather contextual clues as to where tampering may have taken place. In this vein, we introduce five new strongly invariant image comparison methods and test their effectiveness under heavy noise, rotation, and color space changes. Lastly, we show the effectiveness of these methods compared to passive image forensics using Nimble [https://www.nist.gov/itl/iad/mig/nimble-challenge], a new, state-of-the-art dataset from the National Institute of Standards and Technology (NIST).
Tasks	Image Retrieval
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00604v1
PDF	http://arxiv.org/pdf/1705.00604v1.pdf
PWC	https://paperswithcode.com/paper/spotting-the-difference-context-retrieval-and
Repo
Framework

Using Frame Theoretic Convolutional Gridding for Robust Synthetic Aperture Sonar Imaging


Title	Using Frame Theoretic Convolutional Gridding for Robust Synthetic Aperture Sonar Imaging
Authors	John McKay, Anne Gelb, Vishal Monga, Raghu Raj
Abstract	Recent progress in synthetic aperture sonar (SAS) technology and processing has led to significant advances in underwater imaging, outperforming previously common approaches in both accuracy and efficiency. There are, however, inherent limitations to current SAS reconstruction methodology. In particular, popular and efficient Fourier domain SAS methods require a 2D interpolation which is often ill conditioned and inaccurate, inevitably reducing robustness with regard to speckle and inaccurate sound-speed estimation. To overcome these issues, we propose using the frame theoretic convolution gridding (FTCG) algorithm to handle the non-uniform Fourier data. FTCG extends upon non-uniform fast Fourier transform (NUFFT) algorithms by casting the NUFFT as an approximation problem given Fourier frame data. The FTCG has been show to yield improved accuracy at little more computational cost. Using simulated data, we outline how the FTCG can be used to enhance current SAS processing.
Tasks
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08575v1
PDF	http://arxiv.org/pdf/1706.08575v1.pdf
PWC	https://paperswithcode.com/paper/using-frame-theoretic-convolutional-gridding
Repo
Framework

Sparse Representation based Multi-sensor Image Fusion: A Review


Title	Sparse Representation based Multi-sensor Image Fusion: A Review
Authors	Qiang Zhang, Yi Liu, Rick S. Blum, Jungong Han, Dacheng Tao
Abstract	As a result of several successful applications in computer vision and image processing, sparse representation (SR) has attracted significant attention in multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs) that presume the basis functions, SR learns an over-complete dictionary from a set of training images for image fusion, and it achieves more stable and meaningful representations of the source images. By doing so, the SR-based fusion methods generally outperform the traditional MST-based image fusion methods in both subjective and objective tests. In addition, they are less susceptible to mis-registration among the source images, thus facilitating the practical applications. This survey paper proposes a systematic review of the SR-based multi-sensor image fusion literature, highlighting the pros and cons of each category of approaches. Specifically, we start by performing a theoretical investigation of the entire system from three key algorithmic aspects, (1) sparse representation models; (2) dictionary learning methods; and (3) activity levels and fusion rules. Subsequently, we show how the existing works address these scientific problems and design the appropriate fusion rules for each application, such as multi-focus image fusion and multi-modality (e.g., infrared and visible) image fusion. At last, we carry out some experiments to evaluate the impact of these three algorithmic components on the fusion performance when dealing with different applications. This article is expected to serve as a tutorial and source of reference for researchers preparing to enter the field or who desire to employ the sparse representation theory in other fields.
Tasks	Dictionary Learning, Infrared And Visible Image Fusion
Published	2017-02-12
URL	http://arxiv.org/abs/1702.03515v1
PDF	http://arxiv.org/pdf/1702.03515v1.pdf
PWC	https://paperswithcode.com/paper/sparse-representation-based-multi-sensor
Repo
Framework

Viraliency: Pooling Local Virality


Title	Viraliency: Pooling Local Virality
Authors	Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci
Abstract	In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community. Concurrently, recent progress in deep learning architectures showed that global pooling strategies allow the extraction of activation maps, which highlight the parts of the image most likely to contain instances of a certain class. We extend this concept by introducing a pooling layer that learns the size of the support area to be averaged: the learned top-N average (LENA) pooling. We hypothesize that the latent concepts (feature maps) describing virality may require such a rich pooling strategy. We assess the effectiveness of the LENA layer by appending it on top of a convolutional siamese architecture and evaluate its performance on the task of predicting and localizing virality. We report experiments on two publicly available datasets annotated for virality and show that our method outperforms state-of-the-art approaches.
Tasks
Published	2017-03-11
URL	http://arxiv.org/abs/1703.03937v2
PDF	http://arxiv.org/pdf/1703.03937v2.pdf
PWC	https://paperswithcode.com/paper/viraliency-pooling-local-virality
Repo
Framework

An OpenCL(TM) Deep Learning Accelerator on Arria 10


Title	An OpenCL(TM) Deep Learning Accelerator on Arria 10
Authors	Utku Aydonat, Shane O’Connell, Davor Capalija, Andrew C. Ling, Gordon R. Chiu
Abstract	Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. We show a novel architecture written in OpenCL(TM), which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Furthermore, we show how we can use the Winograd transform to significantly boost the performance of the FPGA. As a result, when running our DLA on Intel’s Arria 10 device we can achieve a performance of 1020 img/s, or 23 img/s/W when running the AlexNet CNN benchmark. This comes to 1382 GFLOPs and is 10x faster with 8.4x more GFLOPS and 5.8x better efficiency than the state-of-the-art on FPGAs. Additionally, 23 img/s/W is competitive against the best publicly known implementation of AlexNet on nVidia’s TitanX GPU.
Tasks	Image Classification
Published	2017-01-13
URL	http://arxiv.org/abs/1701.03534v1
PDF	http://arxiv.org/pdf/1701.03534v1.pdf
PWC	https://paperswithcode.com/paper/an-opencltm-deep-learning-accelerator-on
Repo
Framework