January 28, 2020

2726 words 13 mins read

Paper Group ANR 793

Detecting Unknown Behaviors by Pre-defined Behaviours: An Bayesian Non-parametric Approach. Multi-Scale Quasi-RNN for Next Item Recommendation. RECAL: Reuse of Established CNN classifer Apropos unsupervised Learning paradigm. Towards End-to-End Text Spotting in Natural Scenes. Understanding 3D CNN Behavior for Alzheimer’s Disease Diagnosis from Bra …

Detecting Unknown Behaviors by Pre-defined Behaviours: An Bayesian Non-parametric Approach


Title	Detecting Unknown Behaviors by Pre-defined Behaviours: An Bayesian Non-parametric Approach
Authors	Jin Watanabe, Takatomi Kubo, Fan Yang, Kazushi Ikeda
Abstract	An automatic mouse behavior recognition system can considerably reduce the workload of experimenters and facilitate the analysis process. Typically, supervised approaches, unsupervised approaches and semi-supervised approaches are applied for behavior recognition purpose under a setting which has all of predefined behaviors. In the real situation, however, as mouses can show various types of behaviors, besides the predefined behaviors that we want to analyze, there are many undefined behaviors existing. Both supervised approaches and conventional semi-supervised approaches cannot identify these undefined behaviors. Though unsupervised approaches can detect these undefined behaviors, a post-hoc labeling is needed. In this paper, we propose a semi-supervised infinite Gaussian mixture model (SsIGMM), to incorporate both labeled and unlabelled information in learning process while considering undefined behaviors. It also generates the distribution of the predefined and undefined behaviors by mixture Gaussians, which can be used for further analysis. In our experiments, we confirmed the superiority of SsIGMM for segmenting and labelling mouse-behavior videos.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10806v2
PDF	https://arxiv.org/pdf/1911.10806v2.pdf
PWC	https://paperswithcode.com/paper/detecting-unknown-behaviors-by-pre-defined
Repo
Framework

Multi-Scale Quasi-RNN for Next Item Recommendation


Title	Multi-Scale Quasi-RNN for Next Item Recommendation
Authors	Chaoyue He, Yong Liu, Qingyu Guo, Chunyan Miao
Abstract	How to better utilize sequential information has been extensively studied in the setting of recommender systems. To this end, architectural inductive biases such as Markov-Chains, Recurrent models, Convolutional networks and many others have demonstrated reasonable success on this task. This paper proposes a new neural architecture, multi-scale Quasi-RNN for next item Recommendation (QR-Rec) task. Our model provides the best of both worlds by exploiting multi-scale convolutional features as the compositional gating functions of a recurrent cell. The model is implemented in a multi-scale fashion, i.e., convolutional filters of various widths are implemented to capture different union-level features of input sequences which influence the compositional encoder. The key idea aims to capture the recurrent relations between different kinds of local features, which has never been studied previously in the context of recommendation. Through extensive experiments, we demonstrate that our model achieves state-of-the-art performance on 15 well-established datasets, outperforming strong competitors such as FPMC, Fossil and Caser absolutely by 0.57%-7.16% and relatively by 1.44%-17.65% in terms of MAP, Recall@10 and NDCG@10.
Tasks	Recommendation Systems
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09849v1
PDF	http://arxiv.org/pdf/1902.09849v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-quasi-rnn-for-next-item
Repo
Framework

RECAL: Reuse of Established CNN classifer Apropos unsupervised Learning paradigm


Title	RECAL: Reuse of Established CNN classifer Apropos unsupervised Learning paradigm
Authors	Jayasree Saha, Jayanta Mukhopadhyay
Abstract	Recently, clustering with deep network framework has attracted attention of several researchers in the computer vision community. Deep framework gains extensive attention due to its efficiency and scalability towards large-scale and high-dimensional data. In this paper, we transform supervised CNN classifier architecture into an unsupervised clustering model, called RECAL, which jointly learns discriminative embedding subspace and cluster labels. RECAL is made up of feature extraction layers which are convolutional, followed by unsupervised classifier layers which is fully connected. A multinomial logistic regression function (softmax) stacked on top of classifier layers. We train this network using stochastic gradient descent (SGD) optimizer. However, the successful implementation of our model is revolved around the design of loss function. Our loss function uses the heuristics that true partitioning entails lower entropy given that the class distribution is not heavily skewed. This is a trade-off between the situations of “skewed distribution” and “low-entropy”. To handle this, we have proposed classification entropy and class entropy which are the two components of our loss function. In this approach, size of the mini-batch should be kept high. Experimental results indicate the consistent and competitive behavior of our model for clustering well-known digit, multi-viewed object and face datasets. Morever, we use this model to generate unsupervised patch segmentation for multi-spectral LISS-IV images. We observe that it is able to distinguish built-up area, wet land, vegetation and waterbody from the underlying scene.
Tasks
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06480v1
PDF	https://arxiv.org/pdf/1906.06480v1.pdf
PWC	https://paperswithcode.com/paper/recal-reuse-of-established-cnn-classifer
Repo
Framework

Towards End-to-End Text Spotting in Natural Scenes


Title	Towards End-to-End Text Spotting in Natural Scenes
Authors	Hui Li, Peng Wang, Chunhua Shen
Abstract	Text spotting in natural scene images is of great importance for many image understanding tasks. It includes two sub-tasks: text detection and recognition. In this work, we propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping. In contrast to existing approaches that consider text detection and recognition as two distinct tasks and tackle them one by one, the proposed framework settles these two tasks concurrently. The whole framework can be trained end-to-end and is able to handle text of arbitrary shapes. The convolutional features are calculated only once and shared by both detection and recognition modules. Through multi-task training, the learned features become more discriminate and improve the overall performance. By employing the $2$D attention model in word recognition, the irregularity of text can be robustly addressed. It provides the spatial location for each character, which not only helps local feature extraction in word recognition, but also indicates an orientation angle to refine text localization. Our proposed method has achieved state-of-the-art performance on several standard text spotting benchmarks, including both regular and irregular ones.
Tasks	Image Cropping, Text Spotting
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06013v3
PDF	https://arxiv.org/pdf/1906.06013v3.pdf
PWC	https://paperswithcode.com/paper/towards-end-to-end-text-spotting-in-natural
Repo
Framework

Understanding 3D CNN Behavior for Alzheimer’s Disease Diagnosis from Brain PET Scan


Title	Understanding 3D CNN Behavior for Alzheimer’s Disease Diagnosis from Brain PET Scan
Authors	Jyoti Islam, Yanqing Zhang
Abstract	In recent days, Convolutional Neural Networks (CNN) have demonstrated impressive performance in medical image analysis. However, there is a lack of clear understanding of why and how the Convolutional Neural Network performs so well for image analysis task. How CNN analyzes an image and discriminates among samples of different classes are usually considered as non-transparent. As a result, it becomes difficult to apply CNN based approaches in clinical procedures and automated disease diagnosis systems. In this paper, we consider this issue and work on visualizing and understanding the decision of Convolutional Neural Network for Alzheimer’s Disease (AD) Diagnosis. We develop a 3D deep convolutional neural network for AD diagnosis using brain PET scans and propose using five visualizations techniques - Sensitivity Analysis (Backpropagation), Guided Backpropagation, Occlusion, Brain Area Occlusion, and Layer-wise Relevance Propagation (LRP) to understand the decision of the CNN by highlighting the relevant areas in the PET data.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04563v2
PDF	https://arxiv.org/pdf/1912.04563v2.pdf
PWC	https://paperswithcode.com/paper/understanding-3d-cnn-behavior-for-alzheimers
Repo
Framework

Towards Unifying Neural Architecture Space Exploration and Generalization


Title	Towards Unifying Neural Architecture Space Exploration and Generalization
Authors	Kartikeya Bhardwaj, Radu Marculescu
Abstract	In this paper, we address a fundamental research question of significant practical interest: Can certain theoretical characteristics of CNN architectures indicate a priori (i.e., without training) which models with highly different number of parameters and layers achieve a similar generalization performance? To answer this question, we model CNNs from a network science perspective and introduce a new, theoretically-grounded, architecture-level metric called NN-Mass. We also integrate, for the first time, the PAC-Bayes theory of generalization with small-world networks to discover new synergies among our proposed NN-Mass metric, architecture characteristics, and model generalization. With experiments on real datasets such as CIFAR-10/100, we provide extensive empirical evidence for our theoretical findings. Finally, we exploit these new insights for model compression and achieve up to 3x fewer parameters and FLOPS, while losing minimal accuracy (e.g., 96.82% vs. 97%) over large CNNs on the CIFAR-10 dataset.
Tasks	Model Compression
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00780v1
PDF	https://arxiv.org/pdf/1910.00780v1.pdf
PWC	https://paperswithcode.com/paper/towards-unifying-neural-architecture-space
Repo
Framework

Induced Inflection-Set Keyword Search in Speech


Title	Induced Inflection-Set Keyword Search in Speech
Authors	Oliver Adams, Matthew Wiesner, Jan Trmal, Garrett Nicolai, David Yarowsky
Abstract	We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline. We provide a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12299v1
PDF	https://arxiv.org/pdf/1910.12299v1.pdf
PWC	https://paperswithcode.com/paper/induced-inflection-set-keyword-search-in
Repo
Framework

Multi-agent Interactive Prediction under Challenging Driving Scenarios


Title	Multi-agent Interactive Prediction under Challenging Driving Scenarios
Authors	Weihao Xuan, Ruijie Ren, Yeping Hu
Abstract	In order to drive safely on the road, autonomous vehicle is expected to predict future outcomes of its surrounding environment and react properly. In fact, many researchers have been focused on solving behavioral prediction problems for autonomous vehicles. However, very few of them consider multi-agent prediction under challenging driving scenarios such as urban environment. In this paper, we proposed a prediction method that is able to predict various complicated driving scenarios where heterogeneous road entities, signal lights, and static map information are taken into account. Moreover, the proposed multi-agent interactive prediction (MAIP) system is capable of simultaneously predicting any number of road entities while considering their mutual interactions. A case study of a simulated challenging urban intersection scenario is provided to demonstrate the performance and capability of the proposed prediction system.
Tasks	Autonomous Vehicles
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10737v2
PDF	https://arxiv.org/pdf/1909.10737v2.pdf
PWC	https://paperswithcode.com/paper/multi-agent-interactive-prediction-under
Repo
Framework

Controlling an Autonomous Vehicle with Deep Reinforcement Learning


Title	Controlling an Autonomous Vehicle with Deep Reinforcement Learning
Authors	Andreas Folkers, Matthias Rick, Christof Büskens
Abstract	We present a control approach for autonomous vehicles based on deep reinforcement learning. A neural network agent is trained to map its estimated state to acceleration and steering commands given the objective of reaching a specific target state while considering detected obstacles. Learning is performed using state-of-the-art proximal policy optimization in combination with a simulated environment. Training from scratch takes five to nine hours. The resulting agent is evaluated within simulation and subsequently applied to control a full-size research vehicle. For this, the autonomous exploration of a parking lot is considered, including turning maneuvers and obstacle avoidance. Altogether, this work is among the first examples to successfully apply deep reinforcement learning to a real vehicle.
Tasks	Autonomous Vehicles
Published	2019-09-24
URL	https://arxiv.org/abs/1909.12153v2
PDF	https://arxiv.org/pdf/1909.12153v2.pdf
PWC	https://paperswithcode.com/paper/controlling-an-autonomous-vehicle-with-deep
Repo
Framework

TruPercept: Trust Modelling for Autonomous Vehicle Cooperative Perception from Synthetic Data


Title	TruPercept: Trust Modelling for Autonomous Vehicle Cooperative Perception from Synthetic Data
Authors	Braden Hurl, Robin Cohen, Krzysztof Czarnecki, Steven Waslander
Abstract	Inter-vehicle communication for autonomous vehicles (AVs) stands to provide significant benefits in terms of perception robustness. We propose a novel approach for AVs to communicate perceptual observations, tempered by trust modelling of peers providing reports. Based on the accuracy of reported object detections as verified locally, communicated messages can be fused to augment perception performance beyond line of sight and at great distance from the ego vehicle. Also presented is a new synthetic dataset which can be used to test cooperative perception. The TruPercept dataset includes unreliable and malicious behaviour scenarios to experiment with some challenges cooperative perception introduces. The TruPercept runtime and evaluation framework allows modular component replacement to facilitate ablation studies as well as the creation of new trust scenarios we are able to show.
Tasks	Autonomous Vehicles
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07867v1
PDF	https://arxiv.org/pdf/1909.07867v1.pdf
PWC	https://paperswithcode.com/paper/trupercept-trust-modelling-for-autonomous
Repo
Framework

Hepatocellular Carcinoma Intra-arterial Treatment Response Prediction for Improved Therapeutic Decision-Making


Title	Hepatocellular Carcinoma Intra-arterial Treatment Response Prediction for Improved Therapeutic Decision-Making
Authors	Junlin Yang, Nicha C. Dvornek, Fan Zhang, Julius Chapiro, MingDe Lin, Aaron Abajian, James S. Duncan
Abstract	This work proposes a pipeline to predict treatment response to intra-arterial therapy of patients with Hepatocellular Carcinoma (HCC) for improved therapeutic decision-making. Our graph neural network model seamlessly combines heterogeneous inputs of baseline MR scans, pre-treatment clinical information, and planned treatment characteristics and has been validated on patients with HCC treated by transarterial chemoembolization (TACE). It achieves Accuracy of $0.713 \pm 0.075$, F1 of $0.702 \pm 0.082$ and AUC of $0.710 \pm 0.108$. In addition, the pipeline incorporates uncertainty estimation to select hard cases and most align with the misclassified cases. The proposed pipeline arrives at more informed intra-arterial therapeutic decisions for patients with HCC via improving model accuracy and incorporating uncertainty estimation.
Tasks	Decision Making
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00411v1
PDF	https://arxiv.org/pdf/1912.00411v1.pdf
PWC	https://paperswithcode.com/paper/hepatocellular-carcinoma-intra-arterial
Repo
Framework

Exploring Transfer Learning for Low Resource Emotional TTS


Title	Exploring Transfer Learning for Low Resource Emotional TTS
Authors	Noé Tits, Kevin El Haddad, Thierry Dutoit
Abstract	During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset of another speaker. Then we investigate the possibility to adapt this model to have emotional TTS by fine-tuning the neutral TTS model with a small emotional dataset.
Tasks	Transfer Learning
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04276v1
PDF	http://arxiv.org/pdf/1901.04276v1.pdf
PWC	https://paperswithcode.com/paper/exploring-transfer-learning-for-low-resource
Repo
Framework

A Novel Kalman Filter Based Shilling Attack Detection Algorithm


Title	A Novel Kalman Filter Based Shilling Attack Detection Algorithm
Authors	Xin Liu, Yingyuan Xiao, Xu Jiao, Wenguang Zheng, Zihao Ling
Abstract	Collaborative filtering has been widely used in recommendation systems to recommend items that users might like. However, collaborative filtering based recommendation systems are vulnerable to shilling attacks. Malicious users tend to increase or decrease the recommended frequency of target items by injecting fake profiles. In this paper, we propose a Kalman filter-based attack detection model, which statistically analyzes the difference between the actual rating and the predicted rating calculated by this model to find the potential abnormal time period. The Kalman filter filters out suspicious ratings based on the abnormal time period and identifies suspicious users based on the source of these ratings. The experimental results show that our method performs much better detection performance for the shilling attack than the traditional methods.
Tasks	Recommendation Systems
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06968v1
PDF	https://arxiv.org/pdf/1908.06968v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-kalman-filter-based-shilling-attack
Repo
Framework

Nearly Minimal Over-Parametrization of Shallow Neural Networks


Title	Nearly Minimal Over-Parametrization of Shallow Neural Networks
Authors	Armin Eftekhari, ChaeHwan Song, Volkan Cevher
Abstract	A recent line of work has shown that an overparametrized neural network can perfectly fit the training data, an otherwise often intractable nonconvex optimization problem. For (fully-connected) shallow networks, in the best case scenario, the existing theory requires quadratic over-parametrization as a function of the number of training samples. This paper establishes that linear overparametrization is sufficient to fit the training data, using a simple variant of the (stochastic) gradient descent. Crucially, unlike several related works, the training considered in this paper is not limited to the lazy regime in the sense cautioned against in [1, 2]. Beyond shallow networks, the framework developed in this work for over-parametrization is applicable to a variety of learning problems.
Tasks
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03948v2
PDF	https://arxiv.org/pdf/1910.03948v2.pdf
PWC	https://paperswithcode.com/paper/nearly-minimal-over-parametrization-of
Repo
Framework

Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling


Title	Improving Dense Crowd Counting Convolutional Neural Networks using Inverse k-Nearest Neighbor Maps and Multiscale Upsampling
Authors	Greg Olmschenk, Hao Tang, Zhigang Zhu
Abstract	Gatherings of thousands to millions of people frequently occur for an enormous variety of events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks is less effective than our alternative inverse k-nearest neighbor (i$k$NN) maps, even when used directly in existing state-of-the-art network structures. We also provide a new network architecture MUD-i$k$NN, which uses multi-scale upsampling via transposed convolutions to take full advantage of the provided i$k$NN labeling. This upsampling combined with the i$k$NN maps further improves crowd counting accuracy. Our new network architecture performs favorably in comparison with the state-of-the-art. However, our labeling and upsampling techniques are generally applicable to existing crowd counting architectures.
Tasks	Crowd Counting
Published	2019-01-31
URL	http://arxiv.org/abs/1902.05379v3
PDF	http://arxiv.org/pdf/1902.05379v3.pdf
PWC	https://paperswithcode.com/paper/improving-dense-crowd-counting-convolutional
Repo
Framework