October 16, 2019

3327 words 16 mins read

Paper Group ANR 1089

Learning Representative Temporal Features for Action Recognition. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. Large scale visual place recognition with sub-linear storage growth. Clinical Parameters Prediction for Gait Disorder Recognition. A Deep Ranking Model for Spatio-Temporal Highlight Detection fr …

Learning Representative Temporal Features for Action Recognition


Title	Learning Representative Temporal Features for Action Recognition
Authors	Ali Javidani, Ahmad Mahmoudi-Aznaveh
Abstract	In this paper, a novel video classification methodology is presented that aims to recognize different categories of third-person videos efficiently. The idea is to keep track of motion in videos by following optical flow elements over time. To classify the resulted motion time series efficiently, the idea is letting the machine to learn temporal features along the time dimension. This is done by training a multi-channel one dimensional Convolutional Neural Network (1D-CNN). Since CNNs represent the input data hierarchically, high level features are obtained by further processing of features in lower level layers. As a result, in the case of time series, long-term temporal features are extracted from short-term ones. Besides, the superiority of the proposed method over most of the deep-learning based approaches is that we only try to learn representative temporal features along the time dimension. This reduces the number of learning parameters significantly which results in trainability of our method on even smaller datasets. It is illustrated that the proposed method could reach state-of-the-art results on two public datasets UCF11 and jHMDB with the aid of a more efficient feature vector representation.
Tasks	Optical Flow Estimation, Temporal Action Localization, Time Series, Video Classification
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06724v2
PDF	http://arxiv.org/pdf/1802.06724v2.pdf
PWC	https://paperswithcode.com/paper/learning-representative-temporal-features-for
Repo
Framework

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System


Title	Generative Adversarial User Model for Reinforcement Learning Based Recommendation System
Authors	Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, Le Song
Abstract	There are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems. In this setting, an online user is the environment; neither the reward function nor the environment dynamics are clearly defined, making the application of RL challenging. In this paper, we propose a novel model-based reinforcement learning framework for recommendation systems, where we develop a generative adversarial network to imitate user behavior dynamics and learn her reward function. Using this user model as the simulation environment, we develop a novel Cascading DQN algorithm to obtain a combinatorial recommendation policy which can handle a large number of candidate items efficiently. In our experiments with real data, we show this generative adversarial user model can better explain user behavior than alternatives, and the RL policy based on this model can lead to a better long-term reward for the user and higher click rate for the system.
Tasks	Recommendation Systems
Published	2018-12-27
URL	https://arxiv.org/abs/1812.10613v3
PDF	https://arxiv.org/pdf/1812.10613v3.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-user-model-for
Repo
Framework

Large scale visual place recognition with sub-linear storage growth


Title	Large scale visual place recognition with sub-linear storage growth
Authors	Huu Le, Michael Milford
Abstract	Robotic and animal mapping systems share many of the same objectives and challenges, but differ in one key aspect: where much of the research in robotic mapping has focused on solving the data association problem, the grid cell neurons underlying maps in the mammalian brain appear to intentionally break data association by encoding many locations with a single grid cell neuron. One potential benefit of this intentional aliasing is both sub-linear map storage and computational requirements growth with environment size, which we demonstrated in a previous proof-of-concept study that detected and encoded mutually complementary co-prime pattern frequencies in the visual map data. In this research, we solve several of the key theoretical and practical limitations of that prototype model and achieve significantly better sub-linear storage growth, a factor reduction in storage requirements per map location, scalability to large datasets on standard compute equipment and improved robustness to environments with visually challenging appearance change. These improvements are achieved through several innovations including a flexible user-driven choice mechanism for the periodic patterns underlying the new encoding method, a parallelized chunking technique that splits the map into sub-sections processed in parallel and a novel feature selection approach that selects only the image information most relevant to the encoded temporal patterns. We evaluate our techniques on two large benchmark datasets with the comparison to the previous state-of-the-art system, as well as providing a detailed analysis of system performance with respect to parameters such as required precision performance and the number of cyclic patterns encoded.
Tasks	Chunking, Feature Selection, Visual Place Recognition
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09660v1
PDF	http://arxiv.org/pdf/1810.09660v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-visual-place-recognition-with-sub
Repo
Framework

Clinical Parameters Prediction for Gait Disorder Recognition


Title	Clinical Parameters Prediction for Gait Disorder Recognition
Authors	Soheil Esmaeilzadeh, Ouassim Khebzegga, Mehrad Moradshahi
Abstract	Being able to predict clinical parameters in order to diagnose gait disorders in a patient is of great value in planning treatments. It is known that \textit{decision parameters} such as cadence, step length, and walking speed are critical in the diagnosis of gait disorders in patients. This project aims to predict the decision parameters using two ways and afterwards giving advice on whether a patient needs treatment or not. In one way, we use clinically measured parameters such as Ankle Dorsiflexion, age, walking speed, step length, stride length, weight over height squared (BMI) and etc. to predict the decision parameters. In a second way, we use videos recorded from patient’s walking tests in a clinic in order to extract the coordinates of the joints of the patient over time and predict the decision parameters. Finally, having the decision parameters we pre-classify gait disorder intensity of a patient and as the result make decisions on whether a patient needs treatment or not.
Tasks
Published	2018-05-22
URL	http://arxiv.org/abs/1806.04627v1
PDF	http://arxiv.org/pdf/1806.04627v1.pdf
PWC	https://paperswithcode.com/paper/clinical-parameters-prediction-for-gait
Repo
Framework

A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video


Title	A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360 Video
Authors	Youngjae Yu, Sangho Lee, Joonil Na, Jaeyun Kang, Gunhee Kim
Abstract	We address the problem of highlight detection from a 360 degree video by summarizing it both spatially and temporally. Given a long 360 degree video, we spatially select pleasantly-looking normal field-of-view (NFOV) segments from unlimited field of views (FOV) of the 360 degree video, and temporally summarize it into a concise and informative highlight as a selected subset of subshots. We propose a novel deep ranking model named as Composition View Score (CVS) model, which produces a spherical score map of composition per video segment, and determines which view is suitable for highlight via a sliding window kernel at inference. To evaluate the proposed framework, we perform experiments on the Pano2Vid benchmark dataset and our newly collected 360 degree video highlight dataset from YouTube and Vimeo. Through evaluation using both quantitative summarization metrics and user studies via Amazon Mechanical Turk, we demonstrate that our approach outperforms several state-of-the-art highlight detection methods. We also show that our model is 16 times faster at inference than AutoCam, which is one of the first summarization algorithms of 360 degree videos
Tasks
Published	2018-01-31
URL	http://arxiv.org/abs/1801.10312v1
PDF	http://arxiv.org/pdf/1801.10312v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-ranking-model-for-spatio-temporal
Repo
Framework

LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDAR


Title	LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDAR
Authors	Kazuki Minemura, Hengfui Liau, Abraham Monrroy, Shinpei Kato
Abstract	This paper describes an optimized single-stage deep convolutional neural network to detect objects in urban environments, using nothing more than point cloud data. This feature enables our method to work regardless the time of the day and the lighting conditions.The proposed network structure employs dilated convolutions to gradually increase the perceptive field as depth increases, this helps to reduce the computation time by about 30%. The network input consists of five perspective representations of the unorganized point cloud data. The network outputs an objectness map and the bounding box offset values for each point. Our experiments showed that using reflection, range, and the position on each of the three axes helped to improve the location and orientation of the output bounding box. We carried out quantitative evaluations with the help of the KITTI dataset evaluation server. It achieved the fastest processing speed among the other contenders, making it suitable for real-time applications. We implemented and tested it on a real vehicle with a Velodyne HDL-64 mounted on top of it. We achieved execution times as fast as 50 FPS using desktop GPUs, and up to 10 FPS on a single Intel Core i5 CPU. The deploy implementation is open-sourced and it can be found as a feature branch inside the autonomous driving framework Autoware. Code is available at: https://github.com/CPFL/Autoware/tree/feature/cnn_lidar_detection
Tasks	Autonomous Driving, Object Detection
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04902v2
PDF	http://arxiv.org/pdf/1805.04902v2.pdf
PWC	https://paperswithcode.com/paper/lmnet-real-time-multiclass-object-detection
Repo
Framework

Between collective intelligence and semantic web : hypermediating sites. Contribution to technologies of intelligence


Title	Between collective intelligence and semantic web : hypermediating sites. Contribution to technologies of intelligence
Authors	Lise Verlaet, Sidonie Gallot
Abstract	In this paper we present a new form of access to knowledge through what we call “hypermediator websites”. These hypermediator sites are intermediate between information devices that just scan the book culture and a “real” hypertext writing format.
Tasks
Published	2018-01-08
URL	http://arxiv.org/abs/1801.03003v1
PDF	http://arxiv.org/pdf/1801.03003v1.pdf
PWC	https://paperswithcode.com/paper/between-collective-intelligence-and-semantic
Repo
Framework

Learning to Optimize under Non-Stationarity


Title	Learning to Optimize under Non-Stationarity
Authors	Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu
Abstract	We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining $d,B_T,$ and $T$ as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal $\widetilde{O}(d^{2/3}(B_T+1)^{1/3}T^{2/3})$ dynamic regret, and the tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best $\widetilde{O}(d^{2/3}(B_T+1)^{1/4}T^{3/4})$ dynamic regret.
Tasks
Published	2018-10-06
URL	http://arxiv.org/abs/1810.03024v5
PDF	http://arxiv.org/pdf/1810.03024v5.pdf
PWC	https://paperswithcode.com/paper/learning-to-optimize-under-non-stationarity
Repo
Framework

Predicting clinical significance of BRCA1 and BRCA2 single nucleotide substitution variants with unknown clinical significance using probabilistic neural network and deep neural network-stacked autoencoder


Title	Predicting clinical significance of BRCA1 and BRCA2 single nucleotide substitution variants with unknown clinical significance using probabilistic neural network and deep neural network-stacked autoencoder
Authors	Ehsan Rahmatizad KhajePasha, Mahdi Bazarghan, Hamidreza Kheiri Manjili, Ramin Mohammadkhani, Ruhallah Amandi
Abstract	Non-synonymous single nucleotide polymorphisms (nsSNPs) are single nucleotide substitution occurring in the coding region of a gene and leads to a change in amino-acid sequence of protein. The studies have shown these variations may be associated with disease. Thus, investigating the effects of nsSNPs on protein function will give a greater insight on how nsSNPs can lead into disease. Breast cancer is the most common cancer among women causing highest cancer death every year. BRCA1 and BRCA2 tumor suppressor genes are two main candidates of which, mutations in them can increase the risk of developing breast cancer. For prediction and detection of the cancer one can use experimental or computational methods, but the experimental method is very costly and time consuming in comparison with the computational method. The computer and computational methods have been used for more than 30 years. Here we try to predict the clinical significance of BRCA1 and BRCA2 nsSNPs as well as the unknown clinical significances. Nearly 500 BRCA1 and BRCA2 nsSNPs with known clinical significances retrieved from NCBI database. Based on hydrophobicity or hydrophilicity and their role in proteins’ second structure, they are divided into 6 groups, each assigned with scores. The data are prepared in the acceptable form to the automated prediction mechanisms, Probabilistic Neural Network (PNN) and Deep Neural NetworkStacked AutoEncoder (DNN). With Jackknife cross validation we show that the prediction accuracy achieved for BRCA1 and BRCA2 using PNN are 87.97% and 82.17% respectively, while 95.41% and 92.80% accuracies achieved using DNN. The total required processing time for the training and testing the PNN is 0.9 second and DNN requires about 7 hours of training and it can predict instantly. both methods show great improvement in accuracy and speed compared to previous attempts.
Tasks
Published	2018-05-06
URL	http://arxiv.org/abs/1805.02176v1
PDF	http://arxiv.org/pdf/1805.02176v1.pdf
PWC	https://paperswithcode.com/paper/predicting-clinical-significance-of-brca1-and
Repo
Framework

A Theoretical Investigation of Graph Degree as an Unsupervised Normality Measure


Title	A Theoretical Investigation of Graph Degree as an Unsupervised Normality Measure
Authors	Caglar Aytekin, Francesco Cricri, Lixin Fan, Emre Aksu
Abstract	For a graph representation of a dataset, a straightforward normality measure for a sample can be its graph degree. Considering a weighted graph, degree of a sample is the sum of the corresponding row’s values in a similarity matrix. The measure is intuitive given the abnormal samples are usually rare and they are dissimilar to the rest of the data. In order to have an in-depth theoretical understanding, in this manuscript, we investigate the graph degree in spectral graph clustering based and kernel based point of views and draw connections to a recent kernel method for the two sample problem. We show that our analyses guide us to choose fully-connected graphs whose edge weights are calculated via universal kernels. We show that a simple graph degree based unsupervised anomaly detection method with the above properties, achieves higher accuracy compared to other unsupervised anomaly detection methods on average over 10 widely used datasets. We also provide an extensive analysis on the effect of the kernel parameter on the method’s accuracy.
Tasks	Anomaly Detection, Graph Clustering, Spectral Graph Clustering, Unsupervised Anomaly Detection
Published	2018-01-24
URL	http://arxiv.org/abs/1801.07889v3
PDF	http://arxiv.org/pdf/1801.07889v3.pdf
PWC	https://paperswithcode.com/paper/a-theoretical-investigation-of-graph-degree
Repo
Framework

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling


Title	Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
Authors	Emilie Kaufmann, Wouter Koolen, Aurelien Garivier
Abstract	Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.00973v1
PDF	http://arxiv.org/pdf/1806.00973v1.pdf
PWC	https://paperswithcode.com/paper/sequential-test-for-the-lowest-mean-from
Repo
Framework

WiSeBE: Window-based Sentence Boundary Evaluation


Title	WiSeBE: Window-based Sentence Boundary Evaluation
Authors	Carlos-Emiliano González-Gallardo, Juan-Manuel Torres-Moreno
Abstract	Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task.
Tasks	Boundary Detection, Part-Of-Speech Tagging, Question Answering, Speech Recognition
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08850v1
PDF	http://arxiv.org/pdf/1808.08850v1.pdf
PWC	https://paperswithcode.com/paper/wisebe-window-based-sentence-boundary
Repo
Framework

SiftingGAN: Generating and Sifting Labeled Samples to Improve the Remote Sensing Image Scene Classification Baseline in vitro


Title	SiftingGAN: Generating and Sifting Labeled Samples to Improve the Remote Sensing Image Scene Classification Baseline in vitro
Authors	Dongao Ma, Ping Tang, Lijun Zhao
Abstract	Lack of annotated samples greatly restrains the direct application of deep learning in remote sensing image scene classification. Although researches have been done to tackle this issue by data augmentation with various image transformation operations, they are still limited in quantity and diversity. Recently, the advent of the unsupervised learning based generative adversarial networks (GANs) bring us a new way to generate augmented samples. However, such GAN-generated samples are currently only served for training GANs model itself and for improving the performance of the discriminator in GANs internally (in vivo). It becomes a question of serious doubt whether the GAN-generated samples can help better improve the scene classification performance of other deep learning networks (in vitro), compared with the widely used transformed samples. To answer this question, this paper proposes a SiftingGAN approach to generate more numerous, more diverse and more authentic labeled samples for data augmentation. SiftingGAN extends traditional GAN framework with an Online-Output method for sample generation, a Generative-Model-Sifting method for model sifting, and a Labeled-Sample-Discriminating method for sample sifting. Experiments on the well-known AID dataset demonstrate that the proposed SiftingGAN method can not only effectively improve the performance of the scene classification baseline that is achieved without data augmentation, but also significantly excels the comparison methods based on traditional geometric/radiometric transformation operations.
Tasks	Data Augmentation, Scene Classification
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04985v4
PDF	http://arxiv.org/pdf/1809.04985v4.pdf
PWC	https://paperswithcode.com/paper/siftinggan-generating-and-sifting-labeled
Repo
Framework

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel


Title	Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Authors	Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma
Abstract	Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global convergence results but does not work when there is a standard l2 regularizer, which is useful to have in practice. We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in d dimensions which the optimal regularized neural net learns with O(d) samples but the NTK requires \Omega(d^2) samples to learn. To prove this, we establish two analysis tools: i) for multi-layer feedforward ReLU nets, we show that the global minimizer of a weakly-regularized cross-entropy loss is the max normalized margin solution among all neural nets, which generalizes well; ii) we develop a new technique for proving lower bounds for kernel methods, which relies on showing that the kernel cannot focus on informative features. Motivated by our generalization results, we study whether the regularized global optimum is attainable. We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.
Tasks
Published	2018-10-12
URL	https://arxiv.org/abs/1810.05369v3
PDF	https://arxiv.org/pdf/1810.05369v3.pdf
PWC	https://paperswithcode.com/paper/on-the-margin-theory-of-feedforward-neural
Repo
Framework

Learning to predict crisp boundaries


Title	Learning to predict crisp boundaries
Authors	Ruoxi Deng, Chunhua Shen, Shengjun Liu, Huibing Wang, Xinru Liu
Abstract	Recent methods for boundary or edge detection built on Deep Convolutional Neural Networks (CNNs) typically suffer from the issue of predicted edges being thick and need post-processing to obtain crisp boundaries. Highly imbalanced categories of boundary versus background in training data is one of main reasons for the above problem. In this work, the aim is to make CNNs produce sharp boundaries without post-processing. We introduce a novel loss for boundary detection, which is very effective for classifying imbalanced data and allows CNNs to produce crisp boundaries. Moreover, we propose an end-to-end network which adopts the bottom-up/top-down architecture to tackle the task. The proposed network effectively leverages hierarchical features and produces pixel-accurate boundary mask, which is critical to reconstruct the edge map. Our experiments illustrate that directly making crisp prediction not only promotes the visual results of CNNs, but also achieves better results against the state-of-the-art on the BSDS500 dataset (ODS F-score of .815) and the NYU Depth dataset (ODS F-score of .762).
Tasks	Boundary Detection, Edge Detection
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10097v1
PDF	http://arxiv.org/pdf/1807.10097v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-predict-crisp-boundaries
Repo
Framework