Paper Group ANR 761
Learning Visually Grounded Sentence Representations. Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks. Compositional Falsification of Cyber-Physical Systems with Machine Learning Components. Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs. EAC-Net: A Region-based Deep Enhancing and Cropp …
Learning Visually Grounded Sentence Representations
Title | Learning Visually Grounded Sentence Representations |
Authors | Douwe Kiela, Alexis Conneau, Allan Jabri, Maximilian Nickel |
Abstract | We introduce a variety of models, trained on a supervised image captioning corpus to predict the image features for a given caption, to perform sentence representation grounding. We train a grounded sentence encoder that achieves good performance on COCO caption and image retrieval and subsequently show that this encoder can successfully be transferred to various NLP tasks, with improved performance over text-only models. Lastly, we analyze the contribution of grounding, and show that word embeddings learned by this system outperform non-grounded ones. |
Tasks | Image Captioning, Image Retrieval, Word Embeddings |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06320v2 |
http://arxiv.org/pdf/1707.06320v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-visually-grounded-sentence |
Repo | |
Framework | |
Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks
Title | Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks |
Authors | Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, Luping Shi |
Abstract | Compared with artificial neural networks (ANNs), spiking neural networks (SNNs) are promising to explore the brain-like behaviors since the spikes could encode more spatio-temporal information. Although pre-training from ANN or direct training based on backpropagation (BP) makes the supervised training of SNNs possible, these methods only exploit the networks’ spatial domain information which leads to the performance bottleneck and requires many complicated training skills. Another fundamental issue is that the spike activity is naturally non-differentiable which causes great difficulties in training SNNs. To this end, we build an iterative LIF model that is more friendly for gradient descent training. By simultaneously considering the layer-by-layer spatial domain (SD) and the timing-dependent temporal domain (TD) in the training phase, as well as an approximated derivative for the spike activity, we propose a spatio-temporal backpropagation (STBP) training framework without using any complicated technology. We achieve the best performance of multi-layered perceptron (MLP) compared with existing state-of-the-art algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a custom object detection dataset. This work provides a new perspective to explore the high-performance SNNs for future brain-like computing paradigm with rich spatio-temporal dynamics. |
Tasks | Object Detection |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02609v3 |
http://arxiv.org/pdf/1706.02609v3.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-backpropagation-for-training |
Repo | |
Framework | |
Compositional Falsification of Cyber-Physical Systems with Machine Learning Components
Title | Compositional Falsification of Cyber-Physical Systems with Machine Learning Components |
Authors | Tommaso Dreossi, Alexandre Donzé, Sanjit A. Seshia |
Abstract | Cyber-physical systems (CPS), such as automotive systems, are starting to include sophisticated machine learning (ML) components. Their correctness, therefore, depends on properties of the inner ML modules. While learning algorithms aim to generalize from examples, they are only as good as the examples provided, and recent efforts have shown that they can produce inconsistent output under small adversarial perturbations. This raises the question: can the output from learning components can lead to a failure of the entire CPS? In this work, we address this question by formulating it as a problem of falsifying signal temporal logic (STL) specifications for CPS with ML components. We propose a compositional falsification framework where a temporal logic falsifier and a machine learning analyzer cooperate with the aim of finding falsifying executions of the considered model. The efficacy of the proposed technique is shown on an automatic emergency braking system model with a perception component based on deep neural networks. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00978v3 |
http://arxiv.org/pdf/1703.00978v3.pdf | |
PWC | https://paperswithcode.com/paper/compositional-falsification-of-cyber-physical |
Repo | |
Framework | |
Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs
Title | Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs |
Authors | Vishwanath A. Sindagi, Vishal M. Patel |
Abstract | We present a novel method called Contextual Pyramid CNN (CP-CNN) for generating high-quality crowd density and count estimation by explicitly incorporating global and local contextual information of crowd images. The proposed CP-CNN consists of four modules: Global Context Estimator (GCE), Local Context Estimator (LCE), Density Map Estimator (DME) and a Fusion-CNN (F-CNN). GCE is a VGG-16 based CNN that encodes global context and it is trained to classify input images into different density classes, whereas LCE is another CNN that encodes local context information and it is trained to perform patch-wise classification of input images into different density classes. DME is a multi-column architecture-based CNN that aims to generate high-dimensional feature maps from the input image which are fused with the contextual information estimated by GCE and LCE using F-CNN. To generate high resolution and high-quality density maps, F-CNN uses a set of convolutional and fractionally-strided convolutional layers and it is trained along with the DME in an end-to-end fashion using a combination of adversarial loss and pixel-level Euclidean loss. Extensive experiments on highly challenging datasets show that the proposed method achieves significant improvements over the state-of-the-art methods. |
Tasks | |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00953v1 |
http://arxiv.org/pdf/1708.00953v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-high-quality-crowd-density-maps |
Repo | |
Framework | |
EAC-Net: A Region-based Deep Enhancing and Cropping Approach for Facial Action Unit Detection
Title | EAC-Net: A Region-based Deep Enhancing and Cropping Approach for Facial Action Unit Detection |
Authors | Wei Li, Farnaz Abtahi, Zhigang Zhu, Lijun Yin |
Abstract | In this paper, we propose a deep learning based approach for facial action unit detection by enhancing and cropping the regions of interest. The approach is implemented by adding two novel nets (layers): the enhancing layers and the cropping layers, to a pretrained CNN model. For the enhancing layers, we designed an attention map based on facial landmark features and applied it to a pretrained neural network to conduct enhanced learning (The E-Net). For the cropping layers, we crop facial regions around the detected landmarks and design convolutional layers to learn deeper features for each facial region (C-Net). We then fuse the E-Net and the C-Net to obtain our Enhancing and Cropping (EAC) Net, which can learn both feature enhancing and region cropping functions. Our approach shows significant improvement in performance compared to the state-of-the-art methods applied to BP4D and DISFA AU datasets. |
Tasks | Action Unit Detection, Facial Action Unit Detection |
Published | 2017-02-09 |
URL | http://arxiv.org/abs/1702.02925v1 |
http://arxiv.org/pdf/1702.02925v1.pdf | |
PWC | https://paperswithcode.com/paper/eac-net-a-region-based-deep-enhancing-and |
Repo | |
Framework | |
Exploration in Feature Space for Reinforcement Learning
Title | Exploration in Feature Space for Reinforcement Learning |
Authors | Suraj Narayanan Sasikumar |
Abstract | The infamous exploration-exploitation dilemma is one of the oldest and most important problems in reinforcement learning (RL). Deliberate and effective exploration is necessary for RL agents to succeed in most environments. However, until very recently even very sophisticated RL algorithms employed simple, undirected exploration strategies in large-scale RL tasks. We introduce a new optimistic count-based exploration algorithm for RL that is feasible in high-dimensional MDPs. The success of RL algorithms in these domains depends crucially on generalization from limited training experience. Function approximation techniques enable RL agents to generalize in order to estimate the value of unvisited states, but at present few methods have achieved generalization about the agent’s uncertainty regarding unvisited states. We present a new method for computing a generalized state visit-count, which allows the agent to estimate the uncertainty associated with any state. In contrast to existing exploration techniques, our $\phi$-$\textit{pseudocount}$ achieves generalization by exploiting the feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The resulting $\phi$-$\textit{Exploration-Bonus}$ algorithm rewards the agent for exploring in feature space rather than in the original state space. This method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks. In particular, we report world-class results on several notoriously difficult Atari 2600 video games, including Montezuma’s Revenge. |
Tasks | Montezuma’s Revenge |
Published | 2017-10-05 |
URL | http://arxiv.org/abs/1710.02210v1 |
http://arxiv.org/pdf/1710.02210v1.pdf | |
PWC | https://paperswithcode.com/paper/exploration-in-feature-space-for |
Repo | |
Framework | |
VAIN: Attentional Multi-agent Predictive Modeling
Title | VAIN: Attentional Multi-agent Predictive Modeling |
Authors | Yedid Hoshen |
Abstract | Multi-agent predictive modeling is an essential step for understanding physical, social and team-play systems. Recently, Interaction Networks (INs) were proposed for the task of modeling multi-agent physical systems, INs scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). In this paper we introduce VAIN, a novel attentional architecture for multi-agent predictive modeling that scales linearly with the number of agents. We show that VAIN is effective for multi-agent predictive modeling. Our method is evaluated on tasks from challenging multi-agent prediction domains: chess and soccer, and outperforms competing multi-agent approaches. |
Tasks | |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.06122v2 |
http://arxiv.org/pdf/1706.06122v2.pdf | |
PWC | https://paperswithcode.com/paper/vain-attentional-multi-agent-predictive |
Repo | |
Framework | |
Deep Abstract Q-Networks
Title | Deep Abstract Q-Networks |
Authors | Melrose Roderick, Christopher Grimm, Stefanie Tellex |
Abstract | We examine the problem of learning and planning on high-dimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma’s Revenge and Venture, remain challenging for existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup, and Singh 1999) have shown to be useful in tackling long-horizon problems. We combine recent techniques of deep reinforcement learning with existing model-based approaches using an expert-provided state abstraction. We construct toy domains that elucidate the problem of long horizons, sparse rewards and high-dimensional inputs, and show that our algorithm significantly outperforms previous methods on these domains. Our abstraction-based approach outperforms Deep Q-Networks (Mnih et al. 2015) on Montezuma’s Revenge and Venture, and exhibits backtracking behavior that is absent from previous methods. |
Tasks | Montezuma’s Revenge |
Published | 2017-10-02 |
URL | http://arxiv.org/abs/1710.00459v2 |
http://arxiv.org/pdf/1710.00459v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-abstract-q-networks |
Repo | |
Framework | |
Hierarchical Cross Network for Person Re-identification
Title | Hierarchical Cross Network for Person Re-identification |
Authors | Huan-Cheng Hsu, Ching-Hang Chen, Hsiao-Rong Tyan, Hong-Yuan Mark Liao |
Abstract | Person re-identification (person re-ID) aims at matching target person(s) grabbed from different and non-overlapping camera views. It plays an important role for public safety and has application in various tasks such as, human retrieval, human tracking, and activity analysis. In this paper, we propose a new network architecture called Hierarchical Cross Network (HCN) to perform person re-ID. In addition to the backbone model of a conventional CNN, HCN is equipped with two additional maps called hierarchical cross feature maps. The maps of an HCN are formed by merging layers with different resolutions and semantic levels. With the hierarchical cross feature maps, an HCN can effectively uncover additional semantic features which could not be discovered by a conventional CNN. Although the proposed HCN can discover features with higher semantics, its representation power is still limited. To derive more general representations, we augment the data during the training process by combining multiple datasets. Experiment results show that the proposed method outperformed several state-of-the-art methods. |
Tasks | Person Re-Identification |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.06820v1 |
http://arxiv.org/pdf/1712.06820v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-cross-network-for-person-re |
Repo | |
Framework | |
Analysis of $p$-Laplacian Regularization in Semi-Supervised Learning
Title | Analysis of $p$-Laplacian Regularization in Semi-Supervised Learning |
Authors | Dejan Slepčev, Matthew Thorpe |
Abstract | We investigate a family of regression problems in a semi-supervised setting. The task is to assign real-valued labels to a set of $n$ sample points, provided a small training subset of $N$ labeled points. A goal of semi-supervised learning is to take advantage of the (geometric) structure provided by the large number of unlabeled data when assigning labels. We consider random geometric graphs, with connection radius $\epsilon(n)$, to represent the geometry of the data set. Functionals which model the task reward the regularity of the estimator function and impose or reward the agreement with the training data. Here we consider the discrete $p$-Laplacian regularization. We investigate asymptotic behavior when the number of unlabeled points increases, while the number of training points remains fixed. We uncover a delicate interplay between the regularizing nature of the functionals considered and the nonlocality inherent to the graph constructions. We rigorously obtain almost optimal ranges on the scaling of $\epsilon(n)$ for the asymptotic consistency to hold. We prove that the minimizers of the discrete functionals in random setting converge uniformly to the desired continuum limit. Furthermore we discover that for the standard model used there is a restrictive upper bound on how quickly $\epsilon(n)$ must converge to zero as $n \to \infty$. We introduce a new model which is as simple as the original model, but overcomes this restriction. |
Tasks | |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06213v2 |
http://arxiv.org/pdf/1707.06213v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-p-laplacian-regularization-in |
Repo | |
Framework | |
Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)
Title | Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s) |
Authors | Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman |
Abstract | We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight widely used datasets, we crowdsource labeling the images as “ambiguous” or “not ambiguous” to segment in order to construct a new dataset we call STATIC. Using STATIC, we develop a system that automatically predicts which images are ambiguous. Experiments demonstrate the advantage of our prediction system over existing saliency-based methods on images from vision benchmarks and images taken by blind people who are trying to recognize objects in their environment. Finally, we introduce a crowdsourcing system to achieve cost savings for collecting the diversity of all valid “ground truth” foreground object segmentations by collecting extra segmentations only when ambiguity is expected. Experiments show our system eliminates up to 47% of human effort compared to existing crowdsourcing methods with no loss in capturing the diversity of ground truths. |
Tasks | Semantic Segmentation |
Published | 2017-04-30 |
URL | http://arxiv.org/abs/1705.00366v1 |
http://arxiv.org/pdf/1705.00366v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-foreground-object-ambiguity-and |
Repo | |
Framework | |
Spaceprint: a Mobility-based Fingerprinting Scheme for Public Spaces
Title | Spaceprint: a Mobility-based Fingerprinting Scheme for Public Spaces |
Authors | Mitra Baratchi, Geert Heijenk, Maarten van Steen |
Abstract | In this paper, we address the problem of how automated situation-awareness can be achieved by learning real-world situations from ubiquitously generated mobility data. Without semantic input about the time and space where situations take place, this turns out to be a fundamental challenging problem. Uncertainties also introduce technical challenges when data is generated in irregular time intervals, being mixed with noise, and errors. Purely relying on temporal patterns observable in mobility data, in this paper, we propose Spaceprint, a fully automated algorithm for finding the repetitive pattern of similar situations in spaces. We evaluate this technique by showing how the latent variables describing the category, and the actual identity of a space can be discovered from the extracted situation patterns. Doing so, we use different real-world mobility datasets with data about the presence of mobile entities in a variety of spaces. We also evaluate the performance of this technique by showing its robustness against uncertainties. |
Tasks | |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.09962v1 |
http://arxiv.org/pdf/1703.09962v1.pdf | |
PWC | https://paperswithcode.com/paper/spaceprint-a-mobility-based-fingerprinting |
Repo | |
Framework | |
Learning Neural Word Salience Scores
Title | Learning Neural Word Salience Scores |
Authors | Krasen Samardzhiev, Andrew Gargett, Danushka Bollegala |
Abstract | Measuring the salience of a word is an essential step in numerous NLP tasks. Heuristic approaches such as tfidf have been used so far to estimate the salience of words. We propose \emph{Neural Word Salience} (NWS) scores, unlike heuristics, are learnt from a corpus. Specifically, we learn word salience scores such that, using pre-trained word embeddings as the input, can accurately predict the words that appear in a sentence, given the words that appear in the sentences preceding or succeeding that sentence. Experimental results on sentence similarity prediction show that the learnt word salience scores perform comparably or better than some of the state-of-the-art approaches for representing sentences on benchmark datasets for sentence similarity, while using only a fraction of the training and prediction times required by prior methods. Moreover, our NWS scores positively correlate with psycholinguistic measures such as concreteness, and imageability implying a close connection to the salience as perceived by humans. |
Tasks | Word Embeddings |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.01186v1 |
http://arxiv.org/pdf/1709.01186v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-neural-word-salience-scores |
Repo | |
Framework | |
What’s Mine is Yours: Pretrained CNNs for Limited Training Sonar ATR
Title | What’s Mine is Yours: Pretrained CNNs for Limited Training Sonar ATR |
Authors | John McKay, Isaac Gerg, Vishal Monga, Raghu Raj |
Abstract | Finding mines in Sonar imagery is a significant problem with a great deal of relevance for seafaring military and commercial endeavors. Unfortunately, the lack of enormous Sonar image data sets has prevented automatic target recognition (ATR) algorithms from some of the same advances seen in other computer vision fields. Namely, the boom in convolutional neural nets (CNNs) which have been able to achieve incredible results - even surpassing human actors - has not been an easily feasible route for many practitioners of Sonar ATR. We demonstrate the power of one avenue to incorporating CNNs into Sonar ATR: transfer learning. We first show how well a straightforward, flexible CNN feature-extraction strategy can be used to obtain impressive if not state-of-the-art results. Secondly, we propose a way to utilize the powerful transfer learning approach towards multiple instance target detection and identification within a provided synthetic aperture Sonar data set. |
Tasks | Transfer Learning |
Published | 2017-06-29 |
URL | http://arxiv.org/abs/1706.09858v1 |
http://arxiv.org/pdf/1706.09858v1.pdf | |
PWC | https://paperswithcode.com/paper/whats-mine-is-yours-pretrained-cnns-for |
Repo | |
Framework | |
Tracing a Loose Wordhood for Chinese Input Method Engine
Title | Tracing a Loose Wordhood for Chinese Input Method Engine |
Authors | Xihu Zhang, Chu Wei, Hai Zhao |
Abstract | Chinese input methods are used to convert pinyin sequence or other Latin encoding systems into Chinese character sentences. For more effective pinyin-to-character conversion, typical Input Method Engines (IMEs) rely on a predefined vocabulary that demands manually maintenance on schedule. For the purpose of removing the inconvenient vocabulary setting, this work focuses on automatic wordhood acquisition by fully considering that Chinese inputting is a free human-computer interaction procedure. Instead of strictly defining words, a loose word likelihood is introduced for measuring how likely a character sequence can be a user-recognized word with respect to using IME. Then an online algorithm is proposed to adjust the word likelihood or generate new words by comparing user true choice for inputting and the algorithm prediction. The experimental results show that the proposed solution can agilely adapt to diverse typings and demonstrate performance approaching highly-optimized IME with fixed vocabulary. |
Tasks | |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04158v1 |
http://arxiv.org/pdf/1712.04158v1.pdf | |
PWC | https://paperswithcode.com/paper/tracing-a-loose-wordhood-for-chinese-input |
Repo | |
Framework | |