July 29, 2019

3400 words 16 mins read

Paper Group ANR 46

Investigating the feature collection for semantic segmentation via single skip connection. Learning Policies for Adaptive Tracking with Deep Feature Cascades. Patchnet: Interpretable Neural Networks for Image Classification. Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees. Bridging the Gap …

Investigating the feature collection for semantic segmentation via single skip connection


Title	Investigating the feature collection for semantic segmentation via single skip connection
Authors	Jonghwa Yim, Kyung-Ah Sohn
Abstract	Since the study of deep convolutional neural network became prevalent, one of the important discoveries is that a feature map from a convolutional network can be extracted before going into the fully connected layer and can be used as a saliency map for object detection. Furthermore, the model can use features from each different layer for accurate object detection: the features from different layers can have different properties. As the model goes deeper, it has many latent skip connections and feature maps to elaborate object detection. Although there are many intermediate layers that we can use for semantic segmentation through skip connection, still the characteristics of each skip connection and the best skip connection for this task are uncertain. Therefore, in this study, we exhaustively research skip connections of state-of-the-art deep convolutional networks and investigate the characteristics of the features from each intermediate layer. In addition, this study would suggest how to use a recent deep neural network model for semantic segmentation and it would therefore become a cornerstone for later studies with the state-of-the-art network models.
Tasks	Object Detection, Semantic Segmentation
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08192v1
PDF	http://arxiv.org/pdf/1710.08192v1.pdf
PWC	https://paperswithcode.com/paper/investigating-the-feature-collection-for
Repo
Framework

Learning Policies for Adaptive Tracking with Deep Feature Cascades


Title	Learning Policies for Adaptive Tracking with Deep Feature Cascades
Authors	Chen Huang, Simon Lucey, Deva Ramanan
Abstract	Visual object tracking is a fundamental and time-critical vision task. Recent years have seen many shallow tracking methods based on real-time pixel-based correlation filters, as well as deep methods that have top performance but need a high-end GPU. In this paper, we learn to improve the speed of deep trackers without losing accuracy. Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features. We formulate the adaptive tracking problem as a decision-making process, and learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network. This significantly reduces the feed-forward cost for easy frames with distinct or slow-moving objects. We train the agent offline in a reinforcement learning fashion, and further demonstrate that learning all deep layers (so as to provide good features for adaptive tracking) can lead to near real-time average tracking speed of 23 fps on a single CPU while achieving state-of-the-art performance. Perhaps most tellingly, our approach provides a 100X speedup for almost 50% of the time, indicating the power of an adaptive approach.
Tasks	Decision Making, Object Tracking, Visual Object Tracking
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02973v2
PDF	http://arxiv.org/pdf/1708.02973v2.pdf
PWC	https://paperswithcode.com/paper/learning-policies-for-adaptive-tracking-with
Repo
Framework

Patchnet: Interpretable Neural Networks for Image Classification


Title	Patchnet: Interpretable Neural Networks for Image Classification
Authors	Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler
Abstract	Understanding how a complex machine learning model makes a classification decision is essential for its acceptance in sensitive areas such as health care. Towards this end, we present PatchNet, a method that provides the features indicative of each class in an image using a tradeoff between restricting global image context and classification error. We mathematically analyze this tradeoff, demonstrate Patchnet’s ability to construct sharp visual heatmap representations of the learned features, and quantitatively compare these features with features selected by domain experts by applying PatchNet to the classification of benign/malignant skin lesions from the ISBI-ISIC 2017 melanoma classification challenge.
Tasks	Image Classification
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08078v4
PDF	http://arxiv.org/pdf/1705.08078v4.pdf
PWC	https://paperswithcode.com/paper/patchnet-interpretable-neural-networks-for
Repo
Framework

Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees


Title	Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees
Authors	Mathias Kraus, Stefan Feuerriegel
Abstract	Prominent applications of sentiment analysis are countless, covering areas such as marketing, customer service and communication. The conventional bag-of-words approach for measuring sentiment merely counts term frequencies; however, it neglects the position of the terms within the discourse. As a remedy, we develop a discourse-aware method that builds upon the discourse structure of documents. For this purpose, we utilize rhetorical structure theory to label (sub-)clauses according to their hierarchical relationships and then assign polarity scores to individual leaves. To learn from the resulting rhetorical structure, we propose a tensor-based, tree-structured deep neural network (named Discourse-LSTM) in order to process the complete discourse tree. The underlying tensors infer the salient passages of narrative materials. In addition, we suggest two algorithms for data augmentation (node reordering and artificial leaf insertion) that increase our training set and reduce overfitting. Our benchmarks demonstrate the superior performance of our approach. Moreover, our tensor structure reveals the salient text passages and thereby provides explanatory insights.
Tasks	Data Augmentation, Sentiment Analysis
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05228v3
PDF	http://arxiv.org/pdf/1704.05228v3.pdf
PWC	https://paperswithcode.com/paper/sentiment-analysis-based-on-rhetorical
Repo
Framework

Bridging the Gap Between Value and Policy Based Reinforcement Learning


Title	Bridging the Gap Between Value and Policy Based Reinforcement Learning
Authors	Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
Abstract	We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that softmax consistent action values correspond to optimal entropy regularized policy probabilities along any action sequence, regardless of provenance. From this observation, we develop a new RL algorithm, Path Consistency Learning (PCL), that minimizes a notion of soft consistency error along multi-step action sequences extracted from both on- and off-policy traces. We examine the behavior of PCL in different scenarios and show that PCL can be interpreted as generalizing both actor-critic and Q-learning algorithms. We subsequently deepen the relationship by showing how a single model can be used to represent both a policy and the corresponding softmax state values, eliminating the need for a separate critic. The experimental evaluation demonstrates that PCL significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks.
Tasks	Q-Learning
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08892v3
PDF	http://arxiv.org/pdf/1702.08892v3.pdf
PWC	https://paperswithcode.com/paper/bridging-the-gap-between-value-and-policy
Repo
Framework

Understanding and Visualizing the District of Columbia Capital Bikeshare System Using Data Analysis for Balancing Purposes


Title	Understanding and Visualizing the District of Columbia Capital Bikeshare System Using Data Analysis for Balancing Purposes
Authors	Kiana Roshan Zamir, Ali Shafahi, Ali Haghani
Abstract	Bike sharing systems’ popularity has consistently been rising during the past years. Managing and maintaining these emerging systems are indispensable parts of these systems. Visualizing the current operations can assist in getting a better grasp on the performance of the system. In this paper, a data mining approach is used to identify and visualize some important factors related to bike-share operations and management. To consolidate the data, we cluster stations that have a similar pickup and drop-off profiles during weekdays and weekends. We provide the temporal profile of the center of each cluster which can be used as a simple and practical approach for approximating the number of pickups and drop-offs of the stations. We also define two indices based on stations’ shortages and surpluses that reflect the degree of balancing aid a station needs. These indices can help stakeholders improve the quality of the bike-share user experience in at-least two ways. It can act as a complement to balancing optimization efforts, and it can identify stations that need expansion. We mine the District of Columbia’s regional bike-share data and discuss the findings of this data set. We examine the bike-share system during different quarters of the year and during both peak and non-peak hours. Findings reflect that on weekdays most of the pickups and drop-offs happen during the morning and evening peaks whereas on weekends pickups and drop-offs are spread out throughout the day. We also show that throughout the day, more than 40% of the stations are relatively self-balanced. Not worrying about these stations during ordinary days can allow the balancing efforts to focus on a fewer stations and therefore potentially improve the efficiency of the balancing optimization models.
Tasks
Published	2017-08-14
URL	http://arxiv.org/abs/1708.04196v1
PDF	http://arxiv.org/pdf/1708.04196v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-visualizing-the-district-of
Repo
Framework

Unsupervised Triplet Hashing for Fast Image Retrieval


Title	Unsupervised Triplet Hashing for Fast Image Retrieval
Authors	Shanshan Huang, Yichao Xiong, Ya Zhang, Jia Wang
Abstract	Hashing has played a pivotal role in large-scale image retrieval. With the development of Convolutional Neural Network (CNN), hashing learning has shown great promise. But existing methods are mostly tuned for classification, which are not optimized for retrieval tasks, especially for instance-level retrieval. In this study, we propose a novel hashing method for large-scale image retrieval. Considering the difficulty in obtaining labeled datasets for image retrieval task in large scale, we propose a novel CNN-based unsupervised hashing method, namely Unsupervised Triplet Hashing (UTH). The unsupervised hashing network is designed under the following three principles: 1) more discriminative representations for image retrieval; 2) minimum quantization loss between the original real-valued feature descriptors and the learned hash codes; 3) maximum information entropy for the learned hash codes. Extensive experiments on CIFAR-10, MNIST and In-shop datasets have shown that UTH outperforms several state-of-the-art unsupervised hashing methods in terms of retrieval accuracy.
Tasks	Image Retrieval, Quantization
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08798v1
PDF	http://arxiv.org/pdf/1702.08798v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-triplet-hashing-for-fast-image
Repo
Framework

Deep Spatial Regression Model for Image Crowd Counting


Title	Deep Spatial Regression Model for Image Crowd Counting
Authors	Haiyan Yao, Kang Han, Wanggen Wan, Li Hou
Abstract	Computer vision techniques have been used to produce accurate and generic crowd count estimators in recent years. Due to severe occlusions, appearance variations, perspective distortions and illumination conditions, crowd counting is a very challenging task. To this end, we propose a deep spatial regression model(DSRM) for counting the number of individuals present in a still image with arbitrary perspective and arbitrary resolution. Our proposed model is based on Convolutional Neural Network (CNN) and long short term memory (LSTM). First, we put the images into a pretrained CNN to extract a set of high-level features. Then the features in adjacent regions are used to regress the local counts with a LSTM structure which takes the spatial information into consideration. The final global count is obtained by a sum of the local patches. We apply our framework on several challenging crowd counting datasets, and the experiment results illustrate that our method on the crowd counting and density estimation problem outperforms state-of-the-art methods in terms of reliability and effectiveness.
Tasks	Crowd Counting, Density Estimation
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09757v1
PDF	http://arxiv.org/pdf/1710.09757v1.pdf
PWC	https://paperswithcode.com/paper/deep-spatial-regression-model-for-image-crowd
Repo
Framework

Image Fusion With Cosparse Analysis Operator


Title	Image Fusion With Cosparse Analysis Operator
Authors	Rui Gao, Sergiy A. Vorobyov, Hong Zhao
Abstract	The paper addresses the image fusion problem, where multiple images captured with different focus distances are to be combined into a higher quality all-in-focus image. Most current approaches for image fusion strongly rely on the unrealistic noise-free assumption used during the image acquisition, and then yield limited robustness in fusion processing. In our approach, we formulate the multi-focus image fusion problem in terms of an analysis sparse model, and simultaneously perform the restoration and fusion of multi-focus images. Based on this model, we propose an analysis operator learning, and define a novel fusion function to generate an all-in-focus image. Experimental evaluations confirm the effectiveness of the proposed fusion approach both visually and quantitatively, and show that our approach outperforms state-of-the-art fusion methods.
Tasks
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05240v1
PDF	http://arxiv.org/pdf/1704.05240v1.pdf
PWC	https://paperswithcode.com/paper/image-fusion-with-cosparse-analysis-operator
Repo
Framework

Network-based coverage of mutational profiles reveals cancer genes


Title	Network-based coverage of mutational profiles reveals cancer genes
Authors	Borislav H. Hristov, Mona Singh
Abstract	A central goal in cancer genomics is to identify the somatic alterations that underpin tumor initiation and progression. This task is challenging as the mutational profiles of cancer genomes exhibit vast heterogeneity, with many alterations observed within each individual, few shared somatically mutated genes across individuals, and important roles in cancer for both frequently and infrequently mutated genes. While commonly mutated cancer genes are readily identifiable, those that are rarely mutated across samples are difficult to distinguish from the large numbers of other infrequently mutated genes. Here, we introduce a method that considers per-individual mutational profiles within the context of protein-protein interaction networks in order to identify small connected subnetworks of genes that, while not individually frequently mutated, comprise pathways that are perturbed across (i.e., “cover”) a large fraction of the individuals. We devise a simple yet intuitive objective function that balances identifying a small subset of genes with covering a large fraction of individuals. We show how to solve this problem optimally using integer linear programming and also give a fast heuristic algorithm that works well in practice. We perform a large-scale evaluation of our resulting method, nCOP, on 6,038 TCGA tumor samples across 24 different cancer types. We demonstrate that our approach nCOP is more effective in identifying cancer genes than both methods that do not utilize any network information as well as state-of-the-art network-based methods that aggregate mutational information across individuals. Overall, our work demonstrates the power of combining per-individual mutational information with interaction networks in order to uncover genes functionally relevant in cancers, and in particular those genes that are less frequently mutated.
Tasks
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08544v1
PDF	http://arxiv.org/pdf/1704.08544v1.pdf
PWC	https://paperswithcode.com/paper/network-based-coverage-of-mutational-profiles
Repo
Framework

Cohesion-based Online Actor-Critic Reinforcement Learning for mHealth Intervention


Title	Cohesion-based Online Actor-Critic Reinforcement Learning for mHealth Intervention
Authors	Feiyun Zhu, Peng Liao, Xinliang Zhu, Yaowen Yao, Junzhou Huang
Abstract	In the wake of the vast population of smart device users worldwide, mobile health (mHealth) technologies are hopeful to generate positive and wide influence on people’s health. They are able to provide flexible, affordable and portable health guides to device users. Current online decision-making methods for mHealth assume that the users are completely heterogeneous. They share no information among users and learn a separate policy for each user. However, data for each user is very limited in size to support the separate online learning, leading to unstable policies that contain lots of variances. Besides, we find the truth that a user may be similar with some, but not all, users, and connected users tend to have similar behaviors. In this paper, we propose a network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth. The goal is to explore how to share information among similar users to better convert the limited user information into sharper learned policies. To the best of our knowledge, this is the first online actor-critic RL for mHealth and first network cohesion constrained (actor-critic) RL method in all applications. The network cohesion is important to derive effective policies. We come up with a novel method to learn the network by using the warm start trajectory, which directly reflects the users’ property. The optimization of our model is difficult and very different from the general supervised learning due to the indirect observation of values. As a contribution, we propose two algorithms for the proposed online RLs. Apart from mHealth, the proposed methods can be easily applied or adapted to other health-related tasks. Extensive experiment results on the HeartSteps dataset demonstrates that in a variety of parameter settings, the proposed two methods obtain obvious improvements over the state-of-the-art methods.
Tasks	Decision Making
Published	2017-03-25
URL	http://arxiv.org/abs/1703.10039v2
PDF	http://arxiv.org/pdf/1703.10039v2.pdf
PWC	https://paperswithcode.com/paper/cohesion-based-online-actor-critic
Repo
Framework

Temporal Action Localization by Structured Maximal Sums


Title	Temporal Action Localization by Structured Maximal Sums
Authors	Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng
Abstract	We address the problem of temporal action localization in videos. We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores. Additionally, our model classifies the start, middle, and end of each action as separate components, allowing our system to explicitly model each action’s temporal evolution and take advantage of informative temporal dependencies present in this structure. In this framework, we localize actions by searching for the structured maximal sum, a problem for which we develop a novel, provably-efficient algorithmic solution. The frame-wise classification scores are computed using features from a deep Convolutional Neural Network (CNN), which are trained end-to-end to directly optimize for a novel structured objective. We evaluate our system on the THUMOS 14 action detection benchmark and achieve competitive performance.
Tasks	Action Detection, Action Localization, Structured Prediction, Temporal Action Localization
Published	2017-04-15
URL	http://arxiv.org/abs/1704.04671v1
PDF	http://arxiv.org/pdf/1704.04671v1.pdf
PWC	https://paperswithcode.com/paper/temporal-action-localization-by-structured
Repo
Framework

Agent based simulation of the evolution of society as an alternate maximization problem


Title	Agent based simulation of the evolution of society as an alternate maximization problem
Authors	Amartya Sanyal, Sanjana Garg, Asim Unmesh
Abstract	Understanding the evolution of human society, as a complex adaptive system, is a task that has been looked upon from various angles. In this paper, we simulate an agent-based model with a high enough population tractably. To do this, we characterize an entity called \textit{society}, which helps us reduce the complexity of each step from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$. We propose a very realistic setting, where we design a joint alternate maximization step algorithm to maximize a certain \textit{fitness} function, which we believe simulates the way societies develop. Our key contributions include (i) proposing a novel protocol for simulating the evolution of a society with cheap, non-optimal joint alternate maximization steps (ii) providing a framework for carrying out experiments that adhere to this joint-optimization simulation framework (iii) carrying out experiments to show that it makes sense empirically (iv) providing an alternate justification for the use of \textit{society} in the simulations.
Tasks
Published	2017-07-05
URL	http://arxiv.org/abs/1707.01546v1
PDF	http://arxiv.org/pdf/1707.01546v1.pdf
PWC	https://paperswithcode.com/paper/agent-based-simulation-of-the-evolution-of
Repo
Framework

Multi-Level Recurrent Residual Networks for Action Recognition


Title	Multi-Level Recurrent Residual Networks for Action Recognition
Authors	Zhenxing Zheng, Gaoyun An, Qiuqi Ruan
Abstract	Most existing Convolutional Neural Networks(CNNs) used for action recognition are either difficult to optimize or underuse crucial temporal information. Inspired by the fact that the recurrent model consistently makes breakthroughs in the task related to sequence, we propose a novel Multi-Level Recurrent Residual Networks(MRRN) which incorporates three recognition streams. Each stream consists of a Residual Networks(ResNets) and a recurrent model. The proposed model captures spatiotemporal information by employing both alternative ResNets to learn spatial representations from static frames and stacked Simple Recurrent Units(SRUs) to model temporal dynamics. Three distinct-level streams learned low-, mid-, high-level representations independently are fused by computing a weighted average of their softmax scores to obtain the complementary representations of the video. Unlike previous models which boost performance at the cost of time complexity and space complexity, our models have a lower complexity by employing shortcut connection and are trained end-to-end with greater efficiency. MRRN displays significant performance improvements compared to CNN-RNN framework baselines and obtains comparable performance with the state-of-the-art, achieving 51.3% on HMDB-51 dataset and 81.9% on UCF-101 dataset although no additional data.
Tasks	Temporal Action Localization
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08238v6
PDF	http://arxiv.org/pdf/1711.08238v6.pdf
PWC	https://paperswithcode.com/paper/multi-level-recurrent-residual-networks-for
Repo
Framework

Joint Semantic Synthesis and Morphological Analysis of the Derived Word


Title	Joint Semantic Synthesis and Morphological Analysis of the Derived Word
Authors	Ryan Cotterell, Hinrich Schütze
Abstract	Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituents segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DerivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist’s notion of morphological productivity.
Tasks	Morphological Analysis
Published	2017-01-04
URL	http://arxiv.org/abs/1701.00946v3
PDF	http://arxiv.org/pdf/1701.00946v3.pdf
PWC	https://paperswithcode.com/paper/joint-semantic-synthesis-and-morphological
Repo
Framework