Paper Group ANR 271
Graph Distillation for Action Detection with Privileged Modalities. Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging. AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus. Action Classification and Highlighting in Videos. Lexical-semantic resources: yet powerful resources fo …
Graph Distillation for Action Detection with Privileged Modalities
Title | Graph Distillation for Action Detection with Privileged Modalities |
Authors | Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei |
Abstract | We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on a single domain or task and does not handle the modality discrepancy between training and testing. In this work, we propose a method termed graph distillation that incorporates rich privileged information from a large-scale multimodal dataset in the source domain, and improves the learning in the target domain where training data and modalities are scarce. We evaluate our approach on action classification and detection tasks in multimodal videos, and show that our model outperforms the state-of-the-art by a large margin on the NTU RGB+D and PKU-MMD benchmarks. The code is released at http://alan.vision/eccv18_graph/. |
Tasks | Action Classification, Action Detection, Transfer Learning |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1712.00108v2 |
http://arxiv.org/pdf/1712.00108v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-distillation-for-action-detection-with |
Repo | |
Framework | |
Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging
Title | Auto-context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging |
Authors | Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, Ali Gholipour |
Abstract | Brain extraction or whole brain segmentation is an important first step in many of the neuroimage analysis pipelines. The accuracy and robustness of brain extraction, therefore, is crucial for the accuracy of the entire brain analysis process. With the aim of designing a learning-based, geometry-independent and registration-free brain extraction tool in this study, we present a technique based on an auto-context convolutional neural network (CNN), in which intrinsic local and global image features are learned through 2D patches of different window sizes. In this architecture three parallel 2D convolutional pathways for three different directions (axial, coronal, and sagittal) implicitly learn 3D image information without the need for computationally expensive 3D convolutions. Posterior probability maps generated by the network are used iteratively as context information along with the original image patches to learn the local shape and connectedness of the brain, to extract it from non-brain tissue. The brain extraction results we have obtained from our algorithm are superior to the recently reported results in the literature on two publicly available benchmark datasets, namely LPBA40 and OASIS, in which we obtained Dice overlap coefficients of 97.42% and 95.40%, respectively. Furthermore, we evaluated the performance of our algorithm in the challenging problem of extracting arbitrarily-oriented fetal brains in reconstructed fetal brain magnetic resonance imaging (MRI) datasets. In this application our algorithm performed much better than the other methods (Dice coefficient: 95.98%), where the other methods performed poorly due to the non-standard orientation and geometry of the fetal brain in MRI. Our CNN-based method can provide accurate, geometry-independent brain extraction in challenging applications. |
Tasks | Brain Segmentation |
Published | 2017-03-06 |
URL | http://arxiv.org/abs/1703.02083v2 |
http://arxiv.org/pdf/1703.02083v2.pdf | |
PWC | https://paperswithcode.com/paper/auto-context-convolutional-neural-network |
Repo | |
Framework | |
AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus
Title | AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus |
Authors | Willie Boag, Hassan Kané |
Abstract | In recent years, word embeddings have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of words. Clinical natural language processing datasets, however, tend to be much smaller. Even the largest publicly-available dataset of medical notes is three orders of magnitude smaller than the dataset of the oft-used “Google News” word vectors. In order to make up for limited training data sizes, we encode expert domain knowledge into our embeddings. Building on a previous extension of word2vec, we show that generalizing the notion of a word’s “context” to include arbitrary features creates an avenue for encoding domain knowledge into word embeddings. We show that the word vectors produced by this method outperform their text-only counterparts across the board in correlation with clinical experts. |
Tasks | Word Embeddings |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01460v1 |
http://arxiv.org/pdf/1712.01460v1.pdf | |
PWC | https://paperswithcode.com/paper/awe-cm-vectors-augmenting-word-embeddings |
Repo | |
Framework | |
Action Classification and Highlighting in Videos
Title | Action Classification and Highlighting in Videos |
Authors | Atousa Torabi, Leonid Sigal |
Abstract | Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition. Our model jointly learns to classify actions and highlight frames associated with the action, by attending to salient visual information through a jointly learned soft-attention networks. We explore attention informed by various forms of visual semantic features, including those encoding actions, objects and scenes. We qualitatively show that soft-attention can learn to effectively attend to important objects and scene information correlated with specific human actions. Further, we show that, quantitatively, our attention-based LSTM outperforms the vanilla LSTM and CNN models used by stateof-the-art methods. On a large-scale youtube video dataset, ActivityNet, our model outperforms competing methods in action classification. |
Tasks | Action Classification, Activity Recognition, Human Activity Recognition, Machine Translation |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09522v1 |
http://arxiv.org/pdf/1708.09522v1.pdf | |
PWC | https://paperswithcode.com/paper/action-classification-and-highlighting-in |
Repo | |
Framework | |
Lexical-semantic resources: yet powerful resources for automatic personality classification
Title | Lexical-semantic resources: yet powerful resources for automatic personality classification |
Authors | Xuan-Son Vu, Lucie Flekova, Lili Jiang, Iryna Gurevych |
Abstract | In this paper, we aim to reveal the impact of lexical-semantic resources, used in particular for word sense disambiguation and sense-level semantic categorization, on automatic personality classification task. While stylistic features (e.g., part-of-speech counts) have been shown their power in this task, the impact of semantics beyond targeted word lists is relatively unexplored. We propose and extract three types of lexical-semantic features, which capture high-level concepts and emotions, overcoming the lexical gap of word n-grams. Our experimental results are comparable to state-of-the-art methods, while no personality-specific resources are required. |
Tasks | Word Sense Disambiguation |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09824v1 |
http://arxiv.org/pdf/1711.09824v1.pdf | |
PWC | https://paperswithcode.com/paper/lexical-semantic-resources-yet-powerful |
Repo | |
Framework | |
Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion
Title | Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion |
Authors | Rushil Anirudh, Hyojin Kim, Jayaraman J. Thiagarajan, K. Aditya Mohan, Kyle Champley, Timo Bremer |
Abstract | Computed Tomography (CT) reconstruction is a fundamental component to a wide variety of applications ranging from security, to healthcare. The classical techniques require measuring projections, called sinograms, from a full 180$^\circ$ view of the object. This is impractical in a limited angle scenario, when the viewing angle is less than 180$^\circ$, which can occur due to different factors including restrictions on scanning time, limited flexibility of scanner rotation, etc. The sinograms obtained as a result, cause existing techniques to produce highly artifact-laden reconstructions. In this paper, we propose to address this problem through implicit sinogram completion, on a challenging real world dataset containing scans of common checked-in luggage. We propose a system, consisting of 1D and 2D convolutional neural networks, that operates on a limited angle sinogram to directly produce the best estimate of a reconstruction. Next, we use the x-ray transform on this reconstruction to obtain a “completed” sinogram, as if it came from a full 180$^\circ$ measurement. We feed this to standard analytical and iterative reconstruction techniques to obtain the final reconstruction. We show with extensive experimentation that this combined strategy outperforms many competitive baselines. We also propose a measure of confidence for the reconstruction that enables a practitioner to gauge the reliability of a prediction made by our network. We show that this measure is a strong indicator of quality as measured by the PSNR, while not requiring ground truth at test time. Finally, using a segmentation experiment, we show that our reconstruction preserves the 3D structure of objects effectively. |
Tasks | Computed Tomography (CT) |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10388v3 |
http://arxiv.org/pdf/1711.10388v3.pdf | |
PWC | https://paperswithcode.com/paper/lose-the-views-limited-angle-ct |
Repo | |
Framework | |
CINet: A Learning Based Approach to Incremental Context Modeling in Robots
Title | CINet: A Learning Based Approach to Incremental Context Modeling in Robots |
Authors | Fethiye Irmak Doğan, İlker Bozcan, Mehmet Çelik, Sinan Kalkan |
Abstract | There have been several attempts at modeling context in robots. However, either these attempts assume a fixed number of contexts or use a rule-based approach to determine when to increment the number of contexts. In this paper, we pose the task of when to increment as a learning problem, which we solve using a Recurrent Neural Network. We show that the network successfully (with 98% testing accuracy) learns to predict when to increment, and demonstrate, in a scene modeling problem (where the correct number of contexts is not known), that the robot increments the number of contexts in an expected manner (i.e., the entropy of the system is reduced). We also present how the incremental model can be used for various scene reasoning tasks. |
Tasks | |
Published | 2017-10-13 |
URL | http://arxiv.org/abs/1710.04981v3 |
http://arxiv.org/pdf/1710.04981v3.pdf | |
PWC | https://paperswithcode.com/paper/cinet-a-learning-based-approach-to |
Repo | |
Framework | |
Deep linear neural networks with arbitrary loss: All local minima are global
Title | Deep linear neural networks with arbitrary loss: All local minima are global |
Authors | Thomas Laurent, James von Brecht |
Abstract | We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima. |
Tasks | |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01473v2 |
http://arxiv.org/pdf/1712.01473v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-linear-neural-networks-with-arbitrary |
Repo | |
Framework | |
Compression of Deep Neural Networks for Image Instance Retrieval
Title | Compression of Deep Neural Networks for Image Instance Retrieval |
Authors | Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Lingyu Duan, Tomaso Poggio |
Abstract | Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds of megabytes of storage making them inconvenient to deploy in mobile applications or in custom hardware. In this work, we study the problem of neural network model compression focusing on the image instance retrieval task. We study quantization, coding, pruning and weight sharing techniques for reducing model size for the instance retrieval problem. We provide extensive experimental results on the trade-off between retrieval performance and model size for different types of networks on several data sets providing the most comprehensive study on this topic. We compress models to the order of a few MBs: two orders of magnitude smaller than the uncompressed models while achieving negligible loss in retrieval performance. |
Tasks | Image Instance Retrieval, Model Compression, Quantization |
Published | 2017-01-18 |
URL | http://arxiv.org/abs/1701.04923v1 |
http://arxiv.org/pdf/1701.04923v1.pdf | |
PWC | https://paperswithcode.com/paper/compression-of-deep-neural-networks-for-image |
Repo | |
Framework | |
Annotating and Modeling Empathy in Spoken Conversations
Title | Annotating and Modeling Empathy in Spoken Conversations |
Authors | Firoj Alam, Morena Danieli, Giuseppe Riccardi |
Abstract | Empathy, as defined in behavioral sciences, expresses the ability of human beings to recognize, understand and react to emotions, attitudes and beliefs of others. The lack of an operational definition of empathy makes it difficult to measure it. In this paper, we address two related problems in automatic affective behavior analysis: the design of the annotation protocol and the automatic recognition of empathy from spoken conversations. We propose and evaluate an annotation scheme for empathy inspired by the modal model of emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic spoken conversations. In the context of behavioral analysis, we designed an automatic segmentation and classification system for empathy. Given the different speech and language levels of representation where empathy may be communicated, we investigated features derived from the lexical and acoustic spaces. The feature development process was designed to support both the fusion and automatic selection of relevant features from high dimensional space. The automatic classification system was evaluated on call center conversations where it showed significantly better performance than the baseline. |
Tasks | |
Published | 2017-05-13 |
URL | http://arxiv.org/abs/1705.04839v3 |
http://arxiv.org/pdf/1705.04839v3.pdf | |
PWC | https://paperswithcode.com/paper/annotating-and-modeling-empathy-in-spoken |
Repo | |
Framework | |
Spotting the Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization
Title | Spotting the Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization |
Authors | Joel Brogan, Paolo Bestagini, Aparna Bharati, Allan Pinto, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, Walter Scheirer |
Abstract | As image tampering becomes ever more sophisticated and commonplace, the need for image forensics algorithms that can accurately and quickly detect forgeries grows. In this paper, we revisit the ideas of image querying and retrieval to provide clues to better localize forgeries. We propose a method to perform large-scale image forensics on the order of one million images using the help of an image search algorithm and database to gather contextual clues as to where tampering may have taken place. In this vein, we introduce five new strongly invariant image comparison methods and test their effectiveness under heavy noise, rotation, and color space changes. Lastly, we show the effectiveness of these methods compared to passive image forensics using Nimble [https://www.nist.gov/itl/iad/mig/nimble-challenge], a new, state-of-the-art dataset from the National Institute of Standards and Technology (NIST). |
Tasks | Image Retrieval |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00604v1 |
http://arxiv.org/pdf/1705.00604v1.pdf | |
PWC | https://paperswithcode.com/paper/spotting-the-difference-context-retrieval-and |
Repo | |
Framework | |
Using Frame Theoretic Convolutional Gridding for Robust Synthetic Aperture Sonar Imaging
Title | Using Frame Theoretic Convolutional Gridding for Robust Synthetic Aperture Sonar Imaging |
Authors | John McKay, Anne Gelb, Vishal Monga, Raghu Raj |
Abstract | Recent progress in synthetic aperture sonar (SAS) technology and processing has led to significant advances in underwater imaging, outperforming previously common approaches in both accuracy and efficiency. There are, however, inherent limitations to current SAS reconstruction methodology. In particular, popular and efficient Fourier domain SAS methods require a 2D interpolation which is often ill conditioned and inaccurate, inevitably reducing robustness with regard to speckle and inaccurate sound-speed estimation. To overcome these issues, we propose using the frame theoretic convolution gridding (FTCG) algorithm to handle the non-uniform Fourier data. FTCG extends upon non-uniform fast Fourier transform (NUFFT) algorithms by casting the NUFFT as an approximation problem given Fourier frame data. The FTCG has been show to yield improved accuracy at little more computational cost. Using simulated data, we outline how the FTCG can be used to enhance current SAS processing. |
Tasks | |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08575v1 |
http://arxiv.org/pdf/1706.08575v1.pdf | |
PWC | https://paperswithcode.com/paper/using-frame-theoretic-convolutional-gridding |
Repo | |
Framework | |
Sparse Representation based Multi-sensor Image Fusion: A Review
Title | Sparse Representation based Multi-sensor Image Fusion: A Review |
Authors | Qiang Zhang, Yi Liu, Rick S. Blum, Jungong Han, Dacheng Tao |
Abstract | As a result of several successful applications in computer vision and image processing, sparse representation (SR) has attracted significant attention in multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs) that presume the basis functions, SR learns an over-complete dictionary from a set of training images for image fusion, and it achieves more stable and meaningful representations of the source images. By doing so, the SR-based fusion methods generally outperform the traditional MST-based image fusion methods in both subjective and objective tests. In addition, they are less susceptible to mis-registration among the source images, thus facilitating the practical applications. This survey paper proposes a systematic review of the SR-based multi-sensor image fusion literature, highlighting the pros and cons of each category of approaches. Specifically, we start by performing a theoretical investigation of the entire system from three key algorithmic aspects, (1) sparse representation models; (2) dictionary learning methods; and (3) activity levels and fusion rules. Subsequently, we show how the existing works address these scientific problems and design the appropriate fusion rules for each application, such as multi-focus image fusion and multi-modality (e.g., infrared and visible) image fusion. At last, we carry out some experiments to evaluate the impact of these three algorithmic components on the fusion performance when dealing with different applications. This article is expected to serve as a tutorial and source of reference for researchers preparing to enter the field or who desire to employ the sparse representation theory in other fields. |
Tasks | Dictionary Learning, Infrared And Visible Image Fusion |
Published | 2017-02-12 |
URL | http://arxiv.org/abs/1702.03515v1 |
http://arxiv.org/pdf/1702.03515v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-representation-based-multi-sensor |
Repo | |
Framework | |
Viraliency: Pooling Local Virality
Title | Viraliency: Pooling Local Virality |
Authors | Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci |
Abstract | In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community. Concurrently, recent progress in deep learning architectures showed that global pooling strategies allow the extraction of activation maps, which highlight the parts of the image most likely to contain instances of a certain class. We extend this concept by introducing a pooling layer that learns the size of the support area to be averaged: the learned top-N average (LENA) pooling. We hypothesize that the latent concepts (feature maps) describing virality may require such a rich pooling strategy. We assess the effectiveness of the LENA layer by appending it on top of a convolutional siamese architecture and evaluate its performance on the task of predicting and localizing virality. We report experiments on two publicly available datasets annotated for virality and show that our method outperforms state-of-the-art approaches. |
Tasks | |
Published | 2017-03-11 |
URL | http://arxiv.org/abs/1703.03937v2 |
http://arxiv.org/pdf/1703.03937v2.pdf | |
PWC | https://paperswithcode.com/paper/viraliency-pooling-local-virality |
Repo | |
Framework | |
An OpenCL(TM) Deep Learning Accelerator on Arria 10
Title | An OpenCL(TM) Deep Learning Accelerator on Arria 10 |
Authors | Utku Aydonat, Shane O’Connell, Davor Capalija, Andrew C. Ling, Gordon R. Chiu |
Abstract | Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have often been memory bound due to the limited external memory bandwidth on the FPGA device. We show a novel architecture written in OpenCL(TM), which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Furthermore, we show how we can use the Winograd transform to significantly boost the performance of the FPGA. As a result, when running our DLA on Intel’s Arria 10 device we can achieve a performance of 1020 img/s, or 23 img/s/W when running the AlexNet CNN benchmark. This comes to 1382 GFLOPs and is 10x faster with 8.4x more GFLOPS and 5.8x better efficiency than the state-of-the-art on FPGAs. Additionally, 23 img/s/W is competitive against the best publicly known implementation of AlexNet on nVidia’s TitanX GPU. |
Tasks | Image Classification |
Published | 2017-01-13 |
URL | http://arxiv.org/abs/1701.03534v1 |
http://arxiv.org/pdf/1701.03534v1.pdf | |
PWC | https://paperswithcode.com/paper/an-opencltm-deep-learning-accelerator-on |
Repo | |
Framework | |