July 30, 2019

3119 words 15 mins read

Paper Group AWR 40

Paper Group AWR 40

CNN Fixations: An unraveling approach to visualize the discriminative image regions. Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science. An Equivalence of Fully Connected Layer and Convolutional Layer. TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition. Deep Extreme …

CNN Fixations: An unraveling approach to visualize the discriminative image regions

Title CNN Fixations: An unraveling approach to visualize the discriminative image regions
Authors Konda Reddy Mopuri, Utsav Garg, R. Venkatesh Babu
Abstract Deep convolutional neural networks (CNN) have revolutionized various fields of vision research and have seen unprecedented adoption for multiple tasks such as classification, detection, captioning, etc. However, they offer little transparency into their inner workings and are often treated as black boxes that deliver excellent performance. In this work, we aim at alleviating this opaqueness of CNNs by providing visual explanations for the network’s predictions. Our approach can analyze variety of CNN based models trained for vision applications such as object recognition and caption generation. Unlike existing methods, we achieve this via unraveling the forward pass operation. Proposed method exploits feature dependencies across the layer hierarchy and uncovers the discriminative image locations that guide the network’s predictions. We name these locations CNN-Fixations, loosely analogous to human eye fixations. Our approach is a generic method that requires no architectural changes, additional training or gradient computation and computes the important image locations (CNN Fixations). We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks and data modalities.
Tasks Image Captioning, Object Recognition
Published 2017-08-22
URL http://arxiv.org/abs/1708.06670v3
PDF http://arxiv.org/pdf/1708.06670v3.pdf
PWC https://paperswithcode.com/paper/cnn-fixations-an-unraveling-approach-to
Repo https://github.com/val-iisc/cnn-fixations
Framework tf

Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

Title Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science
Authors Kyle Hundman, Chris A. Mattmann
Abstract We propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random fields (CRF) to identify measurement values and units, followed by a rule-based system to find related entities, descriptors and modifiers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency patterns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve’s ability to generate high-precision extractions with strong recall. We also discuss Marve’s role in refining measurement requirements for NASA’s proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world’s ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. These extractions accelerate broad, cross-cutting research and expose scientists new algorithmic approaches and experimental nuances. They also facilitate identification of scientific opportunities enabled by HyspIRI leading to more efficient scientific investment and research.
Tasks
Published 2017-10-11
URL http://arxiv.org/abs/1710.04312v1
PDF http://arxiv.org/pdf/1710.04312v1.pdf
PWC https://paperswithcode.com/paper/measurement-context-extraction-from-text
Repo https://github.com/khundman/marve
Framework none

An Equivalence of Fully Connected Layer and Convolutional Layer

Title An Equivalence of Fully Connected Layer and Convolutional Layer
Authors Wei Ma, Jun Lu
Abstract This article demonstrates that convolutional operation can be converted to matrix multiplication, which has the same calculation way with fully connected layer. The article is helpful for the beginners of the neural network to understand how fully connected layer and the convolutional layer work in the backend. To be concise and to make the article more readable, we only consider the linear case. It can be extended to the non-linear case easily through plugging in a non-linear encapsulation to the values like this $\sigma(x)$ denoted as $x^{\prime}$.
Tasks
Published 2017-12-04
URL http://arxiv.org/abs/1712.01252v1
PDF http://arxiv.org/pdf/1712.01252v1.pdf
PWC https://paperswithcode.com/paper/an-equivalence-of-fully-connected-layer-and
Repo https://github.com/statsml/Equiv-FCL-CONVL
Framework tf

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

Title TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
Authors Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib
Abstract Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or convolutional networks on temporally-constructed feature vectors (Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then propose and investigate two different networks to further integrate spatiotemporal information: 1) temporal segment RNN and 2) Inception-style Temporal-ConvNet. We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance. However, each of these methods require proper care to achieve state-of-the-art performance; for example, LSTMs require pre-segmented data or else they cannot fully exploit temporal information. Our analysis identifies specific limitations for each method that could form the basis of future work. Our experimental results on UCF101 and HMDB51 datasets achieve state-of-the-art performances, 94.1% and 69.0%, respectively, without requiring extensive temporal augmentation.
Tasks Action Classification, Action Recognition In Videos, Activity Recognition, Temporal Action Localization, Video Classification, Video Understanding
Published 2017-03-30
URL http://arxiv.org/abs/1703.10667v1
PDF http://arxiv.org/pdf/1703.10667v1.pdf
PWC https://paperswithcode.com/paper/ts-lstm-and-temporal-inception-exploiting
Repo https://github.com/chihyaoma/Activity-Recognition-with-CNN-and-RNN
Framework torch

Deep Extreme Cut: From Extreme Points to Object Segmentation

Title Deep Extreme Cut: From Extreme Points to Object Segmentation
Authors Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, Luc Van Gool
Abstract This paper explores the use of extreme points in an object (left-most, right-most, top, bottom pixels) as input to obtain precise object segmentation for images and videos. We do so by adding an extra channel to the image in the input of a convolutional neural network (CNN), which contains a Gaussian centered in each of the extreme points. The CNN learns to transform this information into a segmentation of an object that matches those extreme points. We demonstrate the usefulness of this approach for guided segmentation (grabcut-style), interactive segmentation, video object segmentation, and dense segmentation annotation. We show that we obtain the most precise results to date, also with less user input, in an extensive and varied selection of benchmarks and datasets. All our models and code are publicly available on http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr/.
Tasks Instance Segmentation, Interactive Segmentation, Semantic Segmentation, Video Object Segmentation
Published 2017-11-24
URL http://arxiv.org/abs/1711.09081v2
PDF http://arxiv.org/pdf/1711.09081v2.pdf
PWC https://paperswithcode.com/paper/deep-extreme-cut-from-extreme-points-to
Repo https://github.com/scaelles/DEXTR-KerasTensorflow
Framework tf

Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM

Title Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM
Authors Elena Kochkina, Maria Liakata, Isabelle Augenstein
Abstract This paper describes team Turing’s submission to SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A). Subtask A addresses the challenge of rumour stance classification, which involves identifying the attitude of Twitter users towards the truthfulness of the rumour they are discussing. Stance classification is considered to be an important step towards rumour verification, therefore performing well in this task is expected to be useful in debunking false rumours. In this work we classify a set of Twitter posts discussing rumours into either supporting, denying, questioning or commenting on the underlying rumours. We propose a LSTM-based sequential model that, through modelling the conversational structure of tweets, which achieves an accuracy of 0.784 on the RumourEval test set outperforming all other systems in Subtask A.
Tasks Rumour Detection, Stance Detection
Published 2017-04-24
URL http://arxiv.org/abs/1704.07221v1
PDF http://arxiv.org/pdf/1704.07221v1.pdf
PWC https://paperswithcode.com/paper/turing-at-semeval-2017-task-8-sequential
Repo https://github.com/seongjinpark-88/RumorEval2019
Framework none

Video Frame Interpolation via Adaptive Separable Convolution

Title Video Frame Interpolation via Adaptive Separable Convolution
Authors Simon Niklaus, Long Mai, Feng Liu
Abstract Standard video frame interpolation methods first estimate optical flow between input frames and then synthesize an intermediate frame guided by motion. Recent approaches merge these two steps into a single convolution process by convolving input frames with spatially adaptive kernels that account for motion and re-sampling simultaneously. These methods require large kernels to handle large motion, which limits the number of pixels whose kernels can be estimated at once due to the large memory demand. To address this problem, this paper formulates frame interpolation as local separable convolution over input frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D kernels require significantly fewer parameters to be estimated. Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously. Since our method is able to estimate kernels and synthesizes the whole video frame at once, it allows for the incorporation of perceptual loss to train the neural network to produce visually pleasing frames. This deep neural network is trained end-to-end using widely available video data without any human annotation. Both qualitative and quantitative experiments show that our method provides a practical solution to high-quality video frame interpolation.
Tasks Optical Flow Estimation, Video Frame Interpolation
Published 2017-08-05
URL http://arxiv.org/abs/1708.01692v1
PDF http://arxiv.org/pdf/1708.01692v1.pdf
PWC https://paperswithcode.com/paper/video-frame-interpolation-via-adaptive
Repo https://github.com/carlo-/sepconv-ios
Framework pytorch

Stochastic Neural Networks for Hierarchical Reinforcement Learning

Title Stochastic Neural Networks for Hierarchical Reinforcement Learning
Authors Carlos Florensa, Yan Duan, Pieter Abbeel
Abstract Deep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks combined with an information-theoretic regularizer. Our experiments show that this combination is effective in learning a wide span of interpretable skills in a sample-efficient way, and can significantly boost the learning performance uniformly across a wide range of downstream tasks.
Tasks Hierarchical Reinforcement Learning
Published 2017-04-10
URL http://arxiv.org/abs/1704.03012v1
PDF http://arxiv.org/pdf/1704.03012v1.pdf
PWC https://paperswithcode.com/paper/stochastic-neural-networks-for-hierarchical
Repo https://github.com/florensacc/snn4hrl
Framework none

A Structured Self-attentive Sentence Embedding

Title A Structured Self-attentive Sentence Embedding
Authors Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio
Abstract This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.
Tasks Natural Language Inference, Sentence Embedding, Sentiment Analysis
Published 2017-03-09
URL http://arxiv.org/abs/1703.03130v1
PDF http://arxiv.org/pdf/1703.03130v1.pdf
PWC https://paperswithcode.com/paper/a-structured-self-attentive-sentence
Repo https://github.com/yanqi1811/attention-is-all-you-need
Framework pytorch

Deep Hashing with Category Mask for Fast Video Retrieval

Title Deep Hashing with Category Mask for Fast Video Retrieval
Authors Xu Liu, Lili Zhao, Dajun Ding, Yajiao Dong
Abstract This paper proposes an end-to-end deep hashing framework with category mask for fast video retrieval. We train our network in a supervised way by fully exploiting inter-class diversity and intra-class identity. Classification loss is optimized to maximize inter-class diversity, while intra-pair is introduced to learn representative intra-class identity. We investigate the binary bits distribution related to categories and find out that the effectiveness of binary bits is highly correlated with data categories, and some bits may degrade classification performance of some categories. We then design hash code generation scheme with category mask to filter out bits with negative contribution. Experimental results demonstrate the proposed method outperforms several state-of-the-arts under various evaluation metrics on public datasets.
Tasks Code Generation, Video Retrieval
Published 2017-12-22
URL http://arxiv.org/abs/1712.08315v2
PDF http://arxiv.org/pdf/1712.08315v2.pdf
PWC https://paperswithcode.com/paper/deep-hashing-with-category-mask-for-fast
Repo https://github.com/willard-yuan/hashing-baseline-for-image-retrieval
Framework none

LangPro: Natural Language Theorem Prover

Title LangPro: Natural Language Theorem Prover
Authors Lasha Abzianidze
Abstract LangPro is an automated theorem prover for natural language (https://github.com/kovvalsky/LangPro). Given a set of premises and a hypothesis, it is able to prove semantic relations between them. The prover is based on a version of analytic tableau method specially designed for natural logic. The proof procedure operates on logical forms that preserve linguistic expressions to a large extent. %This property makes the logical forms easily obtainable from syntactic trees. %, in particular, Combinatory Categorial Grammar derivation trees. The nature of proofs is deductive and transparent. On the FraCaS and SICK textual entailment datasets, the prover achieves high results comparable to state-of-the-art.
Tasks Natural Language Inference
Published 2017-08-30
URL http://arxiv.org/abs/1708.09417v1
PDF http://arxiv.org/pdf/1708.09417v1.pdf
PWC https://paperswithcode.com/paper/langpro-natural-language-theorem-prover
Repo https://github.com/kovvalsky/LangPro
Framework none

Decomposition Strategies for Constructive Preference Elicitation

Title Decomposition Strategies for Constructive Preference Elicitation
Authors Paolo Dragone, Stefano Teso, Mohit Kumar, Andrea Passerini
Abstract We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned itera tively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.
Tasks
Published 2017-11-22
URL http://arxiv.org/abs/1711.08247v2
PDF http://arxiv.org/pdf/1711.08247v2.pdf
PWC https://paperswithcode.com/paper/decomposition-strategies-for-constructive
Repo https://github.com/unitn-sml/pcl
Framework none

Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music

Title Efficient Learning of Harmonic Priors for Pitch Detection in Polyphonic Music
Authors Pablo A. Alvarado, Dan Stowell
Abstract Automatic music transcription (AMT) aims to infer a latent symbolic representation of a piece of music (piano-roll), given a corresponding observed audio recording. Transcribing polyphonic music (when multiple notes are played simultaneously) is a challenging problem, due to highly structured overlapping between harmonics. We study whether the introduction of physically inspired Gaussian process (GP) priors into audio content analysis models improves the extraction of patterns required for AMT. Audio signals are described as a linear combination of sources. Each source is decomposed into the product of an amplitude-envelope, and a quasi-periodic component process. We introduce the Mat'ern spectral mixture (MSM) kernel for describing frequency content of singles notes. We consider two different regression approaches. In the sigmoid model every pitch-activation is independently non-linear transformed. In the softmax model several activation GPs are jointly non-linearly transformed. This introduce cross-correlation between activations. We use variational Bayes for approximate inference. We empirically evaluate how these models work in practice transcribing polyphonic music. We demonstrate that rather than encourage dependency between activations, what is relevant for improving pitch detection is to learnt priors that fit the frequency content of the sound events to detect.
Tasks
Published 2017-05-19
URL http://arxiv.org/abs/1705.07104v2
PDF http://arxiv.org/pdf/1705.07104v2.pdf
PWC https://paperswithcode.com/paper/efficient-learning-of-harmonic-priors-for
Repo https://github.com/PabloAlvarado/MSMK
Framework tf

ImageNet Training in Minutes

Title ImageNet Training in Minutes
Authors Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer
Abstract Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10^18 single precision operations in total. On the other hand, the world’s current fastest supercomputer can finish 2 * 10^17 single precision operations per second (Dongarra et al 2017, https://www.top500.org/lists/2017/06/). If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute. However, the current bottleneck for fast DNN training is in the algorithm level. Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors. For large-scale DNN training, we focus on using large-batch data-parallelism synchronous SGD without losing accuracy in the fixed epochs. The LARS algorithm (You, Gitman, Ginsburg, 2017, arXiv:1708.03888) enables us to scale the batch size to extremely large case (e.g. 32K). We finish the 100-epoch ImageNet training with AlexNet in 11 minutes on 1024 CPUs. About three times faster than Facebook’s result (Goyal et al 2017, arXiv:1706.02677), we finish the 90-epoch ImageNet training with ResNet-50 in 20 minutes on 2048 KNLs without losing accuracy. State-of-the-art ImageNet training speed with ResNet-50 is 74.9% top-1 test accuracy in 15 minutes. We got 74.9% top-1 test accuracy in 64 epochs, which only needs 14 minutes. Furthermore, when we increase the batch size to above 16K, our accuracy is much higher than Facebook’s on corresponding batch sizes. Our source code is available upon request.
Tasks
Published 2017-09-14
URL http://arxiv.org/abs/1709.05011v10
PDF http://arxiv.org/pdf/1709.05011v10.pdf
PWC https://paperswithcode.com/paper/imagenet-training-in-minutes
Repo https://github.com/fuentesdt/livermask
Framework none

Few-Example Object Detection with Model Communication

Title Few-Example Object Detection with Model Communication
Authors Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng
Abstract In this paper, we study object detection using a large pool of unlabeled images and only a few labeled images per category, named “few-example object detection”. The key challenge consists in generating trustworthy training samples as many as possible from the pool. Using few training examples as seeds, our method iterates between model training and high-confidence sample selection. In training, easy samples are generated first and, then the poorly initialized model undergoes improvement. As the model becomes more discriminative, challenging but reliable samples are selected. After that, another round of model improvement takes place. To further improve the precision and recall of the generated training samples, we embed multiple detection models in our framework, which has proven to outperform the single model baseline and the model ensemble method. Experiments on PASCAL VOC’07, MS COCO’14, and ILSVRC’13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.
Tasks Object Detection
Published 2017-06-26
URL http://arxiv.org/abs/1706.08249v8
PDF http://arxiv.org/pdf/1706.08249v8.pdf
PWC https://paperswithcode.com/paper/few-example-object-detection-with-model
Repo https://github.com/D-X-Y/DXY-Projects/tree/master/MSPLD
Framework none
comments powered by Disqus