Paper Group ANR 279
Robust Tracking Using Region Proposal Networks. Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk. Matroids Hitting Sets and Unsupervised Dependency Grammar Induction. Video Highlight Prediction Using Audience Chat Reactions. Flow Navigation by Smart Microswimmers via Reinforcement Learning. Comparison of echo state network o …
Robust Tracking Using Region Proposal Networks
Title | Robust Tracking Using Region Proposal Networks |
Authors | Jimmy Ren, Zhiyang Yu, Jianbo Liu, Rui Zhang, Wenxiu Sun, Jiahao Pang, Xiaohao Chen, Qiong Yan |
Abstract | Recent advances in visual tracking showed that deep Convolutional Neural Networks (CNN) trained for image classification can be strong feature extractors for discriminative trackers. However, due to the drastic difference between image classification and tracking, extra treatments such as model ensemble and feature engineering must be carried out to bridge the two domains. Such procedures are either time consuming or hard to generalize well across datasets. In this paper we discovered that the internal structure of Region Proposal Network (RPN)‘s top layer feature can be utilized for robust visual tracking. We showed that such property has to be unleashed by a novel loss function which simultaneously considers classification accuracy and bounding box quality. Without ensemble and any extra treatment on feature maps, our proposed method achieved state-of-the-art results on several large scale benchmarks including OTB50, OTB100 and VOT2016. We will make our code publicly available. |
Tasks | Feature Engineering, Image Classification, Visual Tracking |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10447v1 |
http://arxiv.org/pdf/1705.10447v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-tracking-using-region-proposal |
Repo | |
Framework | |
Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk
Title | Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk |
Authors | Paul Hand, Vladislav Voroninski |
Abstract | We examine the theoretical properties of enforcing priors provided by generative deep neural networks via empirical risk minimization. In particular we consider two models, one in which the task is to invert a generative neural network given access to its last layer and another in which the task is to invert a generative neural network given only compressive linear observations of its last layer. We establish that in both cases, in suitable regimes of network layer sizes and a randomness assumption on the network weights, that the non-convex objective function given by empirical risk minimization does not have any spurious stationary points. That is, we establish that with high probability, at any point away from small neighborhoods around two scalar multiples of the desired solution, there is a descent direction. Hence, there are no local minima, saddle points, or other stationary points outside these neighborhoods. These results constitute the first theoretical guarantees which establish the favorable global geometry of these non-convex optimization problems, and they bridge the gap between the empirical success of enforcing deep generative priors and a rigorous understanding of non-linear inverse problems. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07576v3 |
http://arxiv.org/pdf/1705.07576v3.pdf | |
PWC | https://paperswithcode.com/paper/global-guarantees-for-enforcing-deep |
Repo | |
Framework | |
Matroids Hitting Sets and Unsupervised Dependency Grammar Induction
Title | Matroids Hitting Sets and Unsupervised Dependency Grammar Induction |
Authors | Nicholas Harvey, Vahab Mirrokni, David Karger, Virginia Savova, Leonid Peshkin |
Abstract | This paper formulates a novel problem on graphs: find the minimal subset of edges in a fully connected graph, such that the resulting graph contains all spanning trees for a set of specifed sub-graphs. This formulation is motivated by an un-supervised grammar induction problem from computational linguistics. We present a reduction to some known problems and algorithms from graph theory, provide computational complexity results, and describe an approximation algorithm. |
Tasks | Dependency Grammar Induction |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08992v2 |
http://arxiv.org/pdf/1705.08992v2.pdf | |
PWC | https://paperswithcode.com/paper/matroids-hitting-sets-and-unsupervised |
Repo | |
Framework | |
Video Highlight Prediction Using Audience Chat Reactions
Title | Video Highlight Prediction Using Audience Chat Reactions |
Authors | Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander C. Berg |
Abstract | Sports channel video portals offer an exciting domain for research on multimodal, multilingual analysis. We present methods addressing the problem of automatic video highlight prediction based on joint visual features and textual analysis of the real-world audience discourse with complex slang, in both English and traditional Chinese. We present a novel dataset based on League of Legends championships recorded from North American and Taiwanese Twitch.tv channels (will be released for further research), and demonstrate strong results on these using multimodal, character-level CNN-RNN model architectures. |
Tasks | League of Legends |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08559v1 |
http://arxiv.org/pdf/1707.08559v1.pdf | |
PWC | https://paperswithcode.com/paper/video-highlight-prediction-using-audience |
Repo | |
Framework | |
Flow Navigation by Smart Microswimmers via Reinforcement Learning
Title | Flow Navigation by Smart Microswimmers via Reinforcement Learning |
Authors | Simona Colabrese, Kristian Gustavsson, Antonio Celani, Luca Biferale |
Abstract | Smart active particles can acquire some limited knowledge of the fluid environment from simple mechanical cues and exert a control on their preferred steering direction. Their goal is to learn the best way to navigate by exploiting the underlying flow whenever possible. As an example, we focus our attention on smart gravitactic swimmers. These are active particles whose task is to reach the highest altitude within some time horizon, given the constraints enforced by fluid mechanics. By means of numerical experiments, we show that swimmers indeed learn nearly optimal strategies just by experience. A reinforcement learning algorithm allows particles to learn effective strategies even in difficult situations when, in the absence of control, they would end up being trapped by flow structures. These strategies are highly nontrivial and cannot be easily guessed in advance. This Letter illustrates the potential of reinforcement learning algorithms to model adaptive behavior in complex flows and paves the way towards the engineering of smart microswimmers that solve difficult navigation problems. |
Tasks | |
Published | 2017-01-30 |
URL | http://arxiv.org/abs/1701.08848v3 |
http://arxiv.org/pdf/1701.08848v3.pdf | |
PWC | https://paperswithcode.com/paper/flow-navigation-by-smart-microswimmers-via |
Repo | |
Framework | |
Comparison of echo state network output layer classification methods on noisy data
Title | Comparison of echo state network output layer classification methods on noisy data |
Authors | Ashley Prater |
Abstract | Echo state networks are a recently developed type of recurrent neural network where the internal layer is fixed with random weights, and only the output layer is trained on specific data. Echo state networks are increasingly being used to process spatiotemporal data in real-world settings, including speech recognition, event detection, and robot control. A strength of echo state networks is the simple method used to train the output layer - typically a collection of linear readout weights found using a least squares approach. Although straightforward to train and having a low computational cost to use, this method may not yield acceptable accuracy performance on noisy data. This study compares the performance of three echo state network output layer methods to perform classification on noisy data: using trained linear weights, using sparse trained linear weights, and using trained low-rank approximations of reservoir states. The methods are investigated experimentally on both synthetic and natural datasets. The experiments suggest that using regularized least squares to train linear output weights is superior on data with low noise, but using the low-rank approximations may significantly improve accuracy on datasets contaminated with higher noise levels. |
Tasks | Speech Recognition |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04496v1 |
http://arxiv.org/pdf/1703.04496v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-echo-state-network-output-layer |
Repo | |
Framework | |
Estimation of the volume of the left ventricle from MRI images using deep neural networks
Title | Estimation of the volume of the left ventricle from MRI images using deep neural networks |
Authors | Fangzhou Liao, Xi Chen, Xiaolin Hu, Sen Song |
Abstract | Segmenting human left ventricle (LV) in magnetic resonance imaging (MRI) images and calculating its volume are important for diagnosing cardiac diseases. In 2016, Kaggle organized a competition to estimate the volume of LV from MRI images. The dataset consisted of a large number of cases, but only provided systole and diastole volumes as labels. We designed a system based on neural networks to solve this problem. It began with a detector combined with a neural network classifier for detecting regions of interest (ROIs) containing LV chambers. Then a deep neural network named hypercolumns fully convolutional network was used to segment LV in ROIs. The 2D segmentation results were integrated across different images to estimate the volume. With ground-truth volume labels, this model was trained end-to-end. To improve the result, an additional dataset with only segmentation label was used. The model was trained alternately on these two datasets with different types of teaching signals. We also proposed a variance estimation method for the final prediction. Our algorithm ranked the 4th on the test set in this competition. |
Tasks | |
Published | 2017-02-13 |
URL | http://arxiv.org/abs/1702.03833v1 |
http://arxiv.org/pdf/1702.03833v1.pdf | |
PWC | https://paperswithcode.com/paper/estimation-of-the-volume-of-the-left |
Repo | |
Framework | |
Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis
Title | Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis |
Authors | Hiroyuki Kasai, Hiroyuki Sato, Bamdev Mishra |
Abstract | Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite number of loss functions. The present paper proposes a Riemannian stochastic quasi-Newton algorithm with variance reduction (R-SQN-VR). The key challenges of averaging, adding, and subtracting multiple gradients are addressed with notions of retraction and vector transport. We present convergence analyses of R-SQN-VR on both non-convex and retraction-convex functions under retraction and vector transport operators. The proposed algorithm is evaluated on the Karcher mean computation on the symmetric positive-definite manifold and the low-rank matrix completion on the Grassmann manifold. In all cases, the proposed algorithm outperforms the state-of-the-art Riemannian batch and stochastic gradient algorithms. |
Tasks | Low-Rank Matrix Completion, Matrix Completion |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.04890v3 |
http://arxiv.org/pdf/1703.04890v3.pdf | |
PWC | https://paperswithcode.com/paper/riemannian-stochastic-quasi-newton-algorithm |
Repo | |
Framework | |
Feed Forward and Backward Run in Deep Convolution Neural Network
Title | Feed Forward and Backward Run in Deep Convolution Neural Network |
Authors | Pushparaja Murugan |
Abstract | Convolution Neural Networks (CNN), known as ConvNets are widely used in many visual imagery application, object classification, speech recognition. After the implementation and demonstration of the deep convolution neural network in Imagenet classification in 2012 by krizhevsky, the architecture of deep Convolution Neural Network is attracted many researchers. This has led to the major development in Deep learning frameworks such as Tensorflow, caffe, keras, theno. Though the implementation of deep learning is quite possible by employing deep learning frameworks, mathematical theory and concepts are harder to understand for new learners and practitioners. This article is intended to provide an overview of ConvNets architecture and to explain the mathematical theory behind it including activation function, loss function, feedforward and backward propagation. In this article, grey scale image is taken as input information image, ReLU and Sigmoid activation function are considered for developing the architecture and cross-entropy loss function are used for computing the difference between predicted value and actual value. The architecture is developed in such a way that it can contain one convolution layer, one pooling layer, and multiple dense layers |
Tasks | Object Classification, Speech Recognition |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03278v1 |
http://arxiv.org/pdf/1711.03278v1.pdf | |
PWC | https://paperswithcode.com/paper/feed-forward-and-backward-run-in-deep |
Repo | |
Framework | |
Leak Event Identification in Water Systems Using High Order CRF
Title | Leak Event Identification in Water Systems Using High Order CRF |
Authors | Qing Han, Wentao Zhu, Yang Shi |
Abstract | Today, detection of anomalous events in civil infrastructures (e.g. water pipe breaks and leaks) is time consuming and often takes hours or days. Pipe breakage as one of the most frequent types of failure of water networks often causes community disruptions ranging from temporary interruptions in services to extended loss of business and relocation of residents. In this project, we design and implement a two-phase approach for leak event identification, which leverages dynamic data from multiple information sources including IoT sensing data (pressure values and/or flow rates), geophysical data (water systems), and human inputs (tweets posted on Twitter). In the approach, a high order Conditional Random Field (CRF) is constructed that enforces predictions based on IoT observations consistent with human inputs to improve the performance of event identifications. Considering the physical water network as a graph, a CRF model is built and learned by the Structured Support Vector Machine (SSVM) using node features such as water pressure and flow rate. After that, we built the high order CRF system by enforcing twitter leakage detection information. An optimal inference algorithm is proposed for the adapted high order CRF model. Experimental results show the effectiveness of our system. |
Tasks | |
Published | 2017-03-12 |
URL | http://arxiv.org/abs/1703.04170v1 |
http://arxiv.org/pdf/1703.04170v1.pdf | |
PWC | https://paperswithcode.com/paper/leak-event-identification-in-water-systems |
Repo | |
Framework | |
Hard Mixtures of Experts for Large Scale Weakly Supervised Vision
Title | Hard Mixtures of Experts for Large Scale Weakly Supervised Vision |
Authors | Sam Gross, Marc’Aurelio Ranzato, Arthur Szlam |
Abstract | Training convolutional networks (CNN’s) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN’s that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new (Jacobs et. al. 1991, Collobert et. al. 2003), but in the past, researchers have had to devise sophisticated methods to deal with data fragmentation. We show empirically that modern weakly supervised data sets are large enough to support naive partitioning schemes where each data point is assigned to a single expert. Because the experts are independent, training them in parallel is easy, and evaluation is cheap for the size of the model. Furthermore, we show that we can use a single decoding layer for all the experts, allowing a unified feature embedding space. We demonstrate that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets. |
Tasks | |
Published | 2017-04-20 |
URL | http://arxiv.org/abs/1704.06363v1 |
http://arxiv.org/pdf/1704.06363v1.pdf | |
PWC | https://paperswithcode.com/paper/hard-mixtures-of-experts-for-large-scale |
Repo | |
Framework | |
Toward Crowd-Sensitive Path Planning
Title | Toward Crowd-Sensitive Path Planning |
Authors | Anoop Aroor, Susan L. Epstein |
Abstract | If a robot can predict crowds in parts of its environment that are inaccessible to its sensors, then it can plan to avoid them. This paper proposes a fast, online algorithm that learns average crowd densities in different areas. It also describes how these densities can be incorporated into existing navigation architectures. In simulation across multiple challenging crowd scenarios, the robot reaches its target faster, travels less, and risks fewer collisions than if it were to plan with the traditional A* algorithm. |
Tasks | |
Published | 2017-10-16 |
URL | http://arxiv.org/abs/1710.05503v1 |
http://arxiv.org/pdf/1710.05503v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-crowd-sensitive-path-planning |
Repo | |
Framework | |
Ranking Median Regression: Learning to Order through Local Consensus
Title | Ranking Median Regression: Learning to Order through Local Consensus |
Authors | Stephan Clémençon, Anna Korba, Eric Sibony |
Abstract | This article is devoted to the problem of predicting the value taken by a random permutation $\Sigma$, describing the preferences of an individual over a set of numbered items ${1,; \ldots,; n}$ say, based on the observation of an input/explanatory r.v. $X$ e.g. characteristics of the individual), when error is measured by the Kendall $\tau$ distance. In the probabilistic formulation of the ‘Learning to Order’ problem we propose, which extends the framework for statistical Kemeny ranking aggregation developped in \citet{CKS17}, this boils down to recovering conditional Kemeny medians of $\Sigma$ given $X$ from i.i.d. training examples $(X_1, \Sigma_1),; \ldots,; (X_N, \Sigma_N)$. For this reason, this statistical learning problem is referred to as \textit{ranking median regression} here. Our contribution is twofold. We first propose a probabilistic theory of ranking median regression: the set of optimal elements is characterized, the performance of empirical risk minimizers is investigated in this context and situations where fast learning rates can be achieved are also exhibited. Next we introduce the concept of local consensus/median, in order to derive efficient methods for ranking median regression. The major advantage of this local learning approach lies in its close connection with the widely studied Kemeny aggregation problem. From an algorithmic perspective, this permits to build predictive rules for ranking median regression by implementing efficient techniques for (approximate) Kemeny median computations at a local level in a tractable manner. In particular, versions of $k$-nearest neighbor and tree-based methods, tailored to ranking median regression, are investigated. Accuracy of piecewise constant ranking median regression rules is studied under a specific smoothness assumption for $\Sigma$'s conditional distribution given $X$. |
Tasks | |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1711.00070v2 |
http://arxiv.org/pdf/1711.00070v2.pdf | |
PWC | https://paperswithcode.com/paper/ranking-median-regression-learning-to-order |
Repo | |
Framework | |
Development of a N-type GM-PHD Filter for Multiple Target, Multiple Type Visual Tracking
Title | Development of a N-type GM-PHD Filter for Multiple Target, Multiple Type Visual Tracking |
Authors | Nathanael L. Baisa, Andrew Wallace |
Abstract | We propose a new framework that extends the standard Probability Hypothesis Density (PHD) filter for multiple targets having $N\geq2$ different types based on Random Finite Set theory, taking into account not only background clutter, but also confusions among detections of different target types, which are in general different in character from background clutter. Under Gaussianity and linearity assumptions, our framework extends the existing Gaussian mixture (GM) implementation of the standard PHD filter to create a N-type GM-PHD filter. The methodology is applied to real video sequences by integrating object detectors’ information into this filter for two scenarios. For both cases, Munkres’s variant of the Hungarian assignment algorithm is used to associate tracked target identities between frames. This approach is evaluated and compared to both raw detection and independent GM-PHD filters using the Optimal Sub-pattern Assignment metric and discrimination rate. This shows the improved performance of our strategy on real video sequences. |
Tasks | Visual Tracking |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1706.00672v5 |
http://arxiv.org/pdf/1706.00672v5.pdf | |
PWC | https://paperswithcode.com/paper/development-of-a-n-type-gm-phd-filter-for |
Repo | |
Framework | |
Image Restoration from Patch-based Compressed Sensing Measurement
Title | Image Restoration from Patch-based Compressed Sensing Measurement |
Authors | Guangtao Nie, Ying Fu, Yinqiang Zheng, Hua Huang |
Abstract | A series of methods have been proposed to reconstruct an image from compressively sensed random measurement, but most of them have high time complexity and are inappropriate for patch-based compressed sensing capture, because of their serious blocky artifacts in the restoration results. In this paper, we present a non-iterative image reconstruction method from patch-based compressively sensed random measurement. Our method features two cascaded networks based on residual convolution neural network to learn the end-to-end full image restoration, which is capable of reconstructing image patches and removing the blocky effect with low time cost. Experimental results on synthetic and real data show that our method outperforms state-of-the-art compressive sensing (CS) reconstruction methods with patch-based CS measurement. To demonstrate the effectiveness of our method in more general setting, we apply the de-block process in our method to JPEG compression artifacts removal and achieve outstanding performance as well. |
Tasks | Compressive Sensing, Image Reconstruction, Image Restoration |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00597v1 |
http://arxiv.org/pdf/1706.00597v1.pdf | |
PWC | https://paperswithcode.com/paper/image-restoration-from-patch-based-compressed |
Repo | |
Framework | |