October 19, 2019

3075 words 15 mins read

Paper Group ANR 363

Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli. Self-Adversarially Learned Bayesian Sampling. A General Theory of Equivariant CNNs on Homogeneous Spaces. Identification of Cancer – Mesothelioma Disease Using Logistic Regression and Association Rule. Towards Imperceptible and Robust Adversarial Example Attacks agains …

Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli


Title	Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli
Authors	Eri Matsuo, Ichiro Kobayashi, Shinji Nishimoto, Satoshi Nishida, Hideki Asoh
Abstract	Quantitative modeling of human brain activity based on language representations has been actively studied in systems neuroscience. However, previous studies examined word-level representation, and little is known about whether we could recover structured sentences from brain activity. This study attempts to generate natural language descriptions of semantic contents from human brain activity evoked by visual stimuli. To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image-captioning network model using a deep learning framework. To apply brain activity to the image-captioning network, we train regression models that learn the relationship between brain activity and deep-layer image features. The results demonstrate that the proposed model can decode brain activity and generate descriptions using natural language sentences. We also conducted several experiments with data from different subsets of brain regions known to process visual stimuli. The results suggest that semantic information for sentence generations is widespread across the entire cortex.
Tasks	Image Captioning
Published	2018-01-19
URL	http://arxiv.org/abs/1802.02210v1
PDF	http://arxiv.org/pdf/1802.02210v1.pdf
PWC	https://paperswithcode.com/paper/describing-semantic-representations-of-brain
Repo
Framework

Self-Adversarially Learned Bayesian Sampling


Title	Self-Adversarially Learned Bayesian Sampling
Authors	Yang Zhao, Jianyi Zhang, Changyou Chen
Abstract	Scalable Bayesian sampling is playing an important role in modern machine learning, especially in the fast-developed unsupervised-(deep)-learning models. While tremendous progresses have been achieved via scalable Bayesian sampling such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD), the generated samples are typically highly correlated. Moreover, their sample-generation processes are often criticized to be inefficient. In this paper, we propose a novel self-adversarial learning framework that automatically learns a conditional generator to mimic the behavior of a Markov kernel (transition kernel). High-quality samples can be efficiently generated by direct forward passes though a learned generator. Most importantly, the learning process adopts a self-learning paradigm, requiring no information on existing Markov kernels, e.g., knowledge of how to draw samples from them. Specifically, our framework learns to use current samples, either from the generator or pre-provided training data, to update the generator such that the generated samples progressively approach a target distribution, thus it is called self-learning. Experiments on both synthetic and real datasets verify advantages of our framework, outperforming related methods in terms of both sampling efficiency and sample quality.
Tasks
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08929v1
PDF	http://arxiv.org/pdf/1811.08929v1.pdf
PWC	https://paperswithcode.com/paper/self-adversarially-learned-bayesian-sampling
Repo
Framework

A General Theory of Equivariant CNNs on Homogeneous Spaces


Title	A General Theory of Equivariant CNNs on Homogeneous Spaces
Authors	Taco Cohen, Mario Geiger, Maurice Weiler
Abstract	We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space, and field type. We also consider a fundamental question: what is the most general kind of equivariant linear map between feature spaces (fields) of given types? Following Mackey, we show that such maps correspond one-to-one with convolutions using equivariant kernels, and characterize the space of such kernels.
Tasks
Published	2018-11-05
URL	https://arxiv.org/abs/1811.02017v2
PDF	https://arxiv.org/pdf/1811.02017v2.pdf
PWC	https://paperswithcode.com/paper/a-general-theory-of-equivariant-cnns-on
Repo
Framework

Identification of Cancer – Mesothelioma Disease Using Logistic Regression and Association Rule


Title	Identification of Cancer – Mesothelioma Disease Using Logistic Regression and Association Rule
Authors	Avishek Choudhury
Abstract	Malignant Pleural Mesothelioma (MPM) or malignant mesothelioma (MM) is an atypical, aggressive tumor that matures into cancer in the pleura, a stratum of tissue bordering the lungs. Diagnosis of MPM is difficult and it accounts for about seventy-five percent of all mesothelioma diagnosed yearly in the United States of America. Being a fatal disease, early identification of MPM is crucial for patient survival. Our study implements logistic regression and develops association rules to identify early stage symptoms of MM. We retrieved medical reports generated by Dicle University and implemented logistic regression to measure the model accuracy. We conducted (a) logistic correlation, (b) Omnibus test and (c) Hosmer and Lemeshow test for model evaluation. Moreover, we also developed association rules by confidence, rule support, lift, condition support and deployability. Categorical logistic regression increases the training accuracy from 72.30% to 81.40% with a testing accuracy of 63.46%. The study also shows the top 5 symptoms that is mostly likely indicates the presence in MM. This study concludes that using predictive modeling can enhance primary presentation and diagnosis of MM.
Tasks
Published	2018-12-11
URL	https://arxiv.org/abs/1812.10384v2
PDF	https://arxiv.org/pdf/1812.10384v2.pdf
PWC	https://paperswithcode.com/paper/identification-of-cancer-mesothelioma-disease
Repo
Framework

Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks


Title	Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks
Authors	Bo Luo, Yannan Liu, Lingxiao Wei, Qiang Xu
Abstract	Machine learning systems based on deep neural networks, being able to produce state-of-the-art results on various perception tasks, have gained mainstream adoption in many applications. However, they are shown to be vulnerable to adversarial example attack, which generates malicious output by adding slight perturbations to the input. Previous adversarial example crafting methods, however, use simple metrics to evaluate the distances between the original examples and the adversarial ones, which could be easily detected by human eyes. In addition, these attacks are often not robust due to the inevitable noises and deviation in the physical world. In this work, we present a new adversarial example attack crafting method, which takes the human perceptual system into consideration and maximizes the noise tolerance of the crafted adversarial example. Experimental results demonstrate the efficacy of the proposed technique.
Tasks
Published	2018-01-15
URL	http://arxiv.org/abs/1801.04693v1
PDF	http://arxiv.org/pdf/1801.04693v1.pdf
PWC	https://paperswithcode.com/paper/towards-imperceptible-and-robust-adversarial
Repo
Framework

MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning


Title	MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning
Authors	Chi-Hung Hsu, Shu-Huan Chang, Jhao-Hong Liang, Hsin-Ping Chou, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, Da-Cheng Juan
Abstract	Recent studies on neural architecture search have shown that automatically designed neural networks perform as good as expert-crafted architectures. While most existing works aim at finding architectures that optimize the prediction accuracy, these architectures may have complexity and is therefore not suitable being deployed on certain computing environment (e.g., with limited power budgets). We propose MONAS, a framework for Multi-Objective Neural Architectural Search that employs reward functions considering both prediction accuracy and other important objectives (e.g., power consumption) when searching for neural network architectures. Experimental results showed that, compared to the state-ofthe-arts, models found by MONAS achieve comparable or better classification accuracy on computer vision applications, while satisfying the additional objectives such as peak power.
Tasks	Neural Architecture Search
Published	2018-06-27
URL	http://arxiv.org/abs/1806.10332v2
PDF	http://arxiv.org/pdf/1806.10332v2.pdf
PWC	https://paperswithcode.com/paper/monas-multi-objective-neural-architecture
Repo
Framework

On the exact minimization of saturated loss functions for robust regression and subspace estimation


Title	On the exact minimization of saturated loss functions for robust regression and subspace estimation
Authors	Fabien Lauer
Abstract	This paper deals with robust regression and subspace estimation and more precisely with the problem of minimizing a saturated loss function. In particular, we focus on computational complexity issues and show that an exact algorithm with polynomial time-complexity with respect to the number of data can be devised for robust regression and subspace estimation. This result is obtained by adopting a classification point of view and relating the problems to the search for a linear model that can approximate the maximal number of points with a given error. Approximate variants of the algorithms based on ramdom sampling are also discussed and experiments show that it offers an accuracy gain over the traditional RANSAC for a similar algorithmic simplicity.
Tasks
Published	2018-06-15
URL	http://arxiv.org/abs/1806.05833v2
PDF	http://arxiv.org/pdf/1806.05833v2.pdf
PWC	https://paperswithcode.com/paper/on-the-exact-minimization-of-saturated-loss
Repo
Framework

Cluster Naturalistic Driving Encounters Using Deep Unsupervised Learning


Title	Cluster Naturalistic Driving Encounters Using Deep Unsupervised Learning
Authors	Sisi Li, Wenshuo Wang, Zhaobin Mo, Ding Zhao
Abstract	Learning knowledge from driving encounters could help self-driving cars make appropriate decisions when driving in complex settings with nearby vehicles engaged. This paper develops an unsupervised classifier to group naturalistic driving encounters into distinguishable clusters by combining an auto-encoder with k-means clustering (AE-kMC). The effectiveness of AE-kMC was validated using the data of 10,000 naturalistic driving encounters which were collected by the University of Michigan, Ann Arbor in the past five years. We compare our developed method with the $k$-means clustering methods and experimental results demonstrate that the AE-kMC method outperforms the original k-means clustering method.
Tasks	Self-Driving Cars
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10214v2
PDF	http://arxiv.org/pdf/1802.10214v2.pdf
PWC	https://paperswithcode.com/paper/cluster-naturalistic-driving-encounters-using
Repo
Framework

Dynamics Estimation Using Recurrent Neural Network


Title	Dynamics Estimation Using Recurrent Neural Network
Authors	Astha Sharma
Abstract	There is a plenty of research going on in field of robotics. One of the most important task is dynamic estimation of response during motion. One of the main applications of this research topics is the task of pouring, which is performed daily and is commonly used while cooking. We present an approach to estimate response to a sequence of manipulation actions. We are experimenting with pouring motion and the response is the change of the amount of water in the pouring cup. The pouring motion is represented by rotation angle and the amount of water is represented by its weight. We are using recurrent neural networks for building the neural network model to train on sequences which represents 1307 trails of pouring. The model gives great results on unseen test data which does not too different with training data in terms of dimensions of the cup used for pouring and receiving. The loss obtained with this test data is 4.5920. The model does not give good results on generalization experiments when we provide a test set which has dimensions of the cup very different from those in training data.
Tasks
Published	2018-09-17
URL	http://arxiv.org/abs/1809.06148v1
PDF	http://arxiv.org/pdf/1809.06148v1.pdf
PWC	https://paperswithcode.com/paper/dynamics-estimation-using-recurrent-neural
Repo
Framework

How many labeled license plates are needed?


Title	How many labeled license plates are needed?
Authors	Changhao Wu, Shugong Xu, Guocong Song, Shunqing Zhang
Abstract	Training a good deep learning model often requires a lot of annotated data. As a large amount of labeled data is typically difficult to collect and even more difficult to annotate, data augmentation and data generation are widely used in the process of training deep neural networks. However, there is no clear common understanding on how much labeled data is needed to get satisfactory performance. In this paper, we try to address such a question using vehicle license plate character recognition as an example application. We apply computer graphic scripts and Generative Adversarial Networks to generate and augment a large number of annotated, synthesized license plate images with realistic colors, fonts, and character composition from a small number of real, manually labeled license plate images. Generated and augmented data are mixed and used as training data for the license plate recognition network modified from DenseNet. The experimental results show that the model trained from the generated mixed training data has good generalization ability, and the proposed approach achieves a new state-of-the-art accuracy on Dataset-1 and AOLP, even with a very limited number of original real license plates. In addition, the accuracy improvement caused by data generation becomes more significant when the number of labeled images is reduced. Data augmentation also plays a more significant role when the number of labeled images is increased.
Tasks	Data Augmentation, License Plate Recognition
Published	2018-08-25
URL	http://arxiv.org/abs/1808.08410v1
PDF	http://arxiv.org/pdf/1808.08410v1.pdf
PWC	https://paperswithcode.com/paper/how-many-labeled-license-plates-are-needed
Repo
Framework

Twin Regularization for online speech recognition


Title	Twin Regularization for online speech recognition
Authors	Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio
Abstract	Online speech recognition is crucial for developing natural human-machine interfaces. This modality, however, is significantly more challenging than off-line ASR, since real-time/low-latency constraints inevitably hinder the use of future information, that is known to be very helpful to perform robust predictions. A popular solution to mitigate this issue consists of feeding neural acoustic models with context windows that gather some future frames. This introduces a latency which depends on the number of employed look-ahead features. This paper explores a different approach, based on estimating the future rather than waiting for it. Our technique encourages the hidden representations of a unidirectional recurrent network to embed some useful information about the future. Inspired by a recently proposed technique called Twin Networks, we add a regularization term that forces forward hidden states to be as close as possible to cotemporal backward ones, computed by a “twin” neural network running backwards in time. The experiments, conducted on a number of datasets, recurrent architectures, input features, and acoustic conditions, have shown the effectiveness of this approach. One important advantage is that our method does not introduce any additional computation at test time if compared to standard unidirectional recurrent networks.
Tasks	Speech Recognition
Published	2018-04-15
URL	http://arxiv.org/abs/1804.05374v2
PDF	http://arxiv.org/pdf/1804.05374v2.pdf
PWC	https://paperswithcode.com/paper/twin-regularization-for-online-speech
Repo
Framework

An Analysis of Categorical Distributional Reinforcement Learning


Title	An Analysis of Categorical Distributional Reinforcement Learning
Authors	Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh
Abstract	Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance. This was demonstrated by the recently proposed C51 algorithm, based on categorical distributional reinforcement learning (CDRL) [Bellemare et al., 2017]. However, the theoretical properties of CDRL algorithms are not yet well understood. In this paper, we introduce a framework to analyse CDRL algorithms, establish the importance of the projected distributional Bellman operator in distributional RL, draw fundamental connections between CDRL and the Cram'er distance, and give a proof of convergence for sample-based categorical distributional reinforcement learning algorithms.
Tasks	Distributional Reinforcement Learning
Published	2018-02-22
URL	http://arxiv.org/abs/1802.08163v1
PDF	http://arxiv.org/pdf/1802.08163v1.pdf
PWC	https://paperswithcode.com/paper/an-analysis-of-categorical-distributional
Repo
Framework

Learning to Refine Human Pose Estimation


Title	Learning to Refine Human Pose Estimation
Authors	Mihai Fieraru, Anna Khoreva, Leonid Pishchulin, Bernt Schiele
Abstract	Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model “hard” human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.
Tasks	Data Augmentation, Multi-Person Pose Estimation, Pose Estimation, Pose Tracking
Published	2018-04-21
URL	http://arxiv.org/abs/1804.07909v1
PDF	http://arxiv.org/pdf/1804.07909v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-refine-human-pose-estimation
Repo
Framework

A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector


Title	A Robust Real-Time Automatic License Plate Recognition Based on the YOLO Detector
Authors	Rayson Laroca, Evair Severo, Luiz A. Zanlorensi, Luiz S. Oliveira, Gabriel Resende Gonçalves, William Robson Schwartz, David Menotti
Abstract	Automatic License Plate Recognition (ALPR) has been a frequent topic of research due to many practical applications. However, many of the current solutions are still not robust in real-world situations, commonly depending on many constraints. This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detector. The Convolutional Neural Networks (CNNs) are trained and fine-tuned for each ALPR stage so that they are robust under different conditions (e.g., variations in camera, lighting, and background). Specially for character segmentation and recognition, we design a two-stage approach employing simple data augmentation tricks such as inverted License Plates (LPs) and flipped characters. The resulting ALPR approach achieved impressive results in two datasets. First, in the SSIG dataset, composed of 2,000 frames from 101 vehicle videos, our system achieved a recognition rate of 93.53% and 47 Frames Per Second (FPS), performing better than both Sighthound and OpenALPR commercial systems (89.80% and 93.03%, respectively) and considerably outperforming previous results (81.80%). Second, targeting a more realistic scenario, we introduce a larger public dataset, called UFPR-ALPR dataset, designed to ALPR. This dataset contains 150 videos and 4,500 frames captured when both camera and vehicles are moving and also contains different types of vehicles (cars, motorcycles, buses and trucks). In our proposed dataset, the trial versions of commercial systems achieved recognition rates below 70%. On the other hand, our system performed better, with recognition rate of 78.33% and 35 FPS.
Tasks	Data Augmentation, License Plate Recognition
Published	2018-02-26
URL	http://arxiv.org/abs/1802.09567v6
PDF	http://arxiv.org/pdf/1802.09567v6.pdf
PWC	https://paperswithcode.com/paper/a-robust-real-time-automatic-license-plate
Repo
Framework

BLP – Boundary Likelihood Pinpointing Networks for Accurate Temporal Action Localization


Title	BLP – Boundary Likelihood Pinpointing Networks for Accurate Temporal Action Localization
Authors	Weijie Kong, Nannan Li, Shan Liu, Thomas Li, Ge Li
Abstract	Despite tremendous progress achieved in temporal action detection, state-of-the-art methods still suffer from the sharp performance deterioration when localizing the starting and ending temporal action boundaries. Although most methods apply boundary regression paradigm to tackle this problem, we argue that the direct regression lacks detailed enough information to yield accurate temporal boundaries. In this paper, we propose a novel Boundary Likelihood Pinpointing (BLP) network to alleviate this deficiency of boundary regression and improve the localization accuracy. Given a loosely localized search interval that contains an action instance, BLP casts the problem of localizing temporal boundaries as that of assigning probabilities on each equally divided unit of this interval. These generated probabilities provide useful information regarding the boundary location of the action inside this search interval. Based on these probabilities, we introduce a boundary pinpointing paradigm to pinpoint the accurate boundaries under a simple probabilistic framework. Compared with other C3D feature based detectors, extensive experiments demonstrate that BLP significantly improves the localization performance of recent state-of-the-art detectors, and achieves competitive detection mAP on both THUMOS’ 14 and ActivityNet datasets, particularly when the evaluation tIoU is high.
Tasks	Action Detection, Action Localization, Temporal Action Localization
Published	2018-11-06
URL	https://arxiv.org/abs/1811.02189v6
PDF	https://arxiv.org/pdf/1811.02189v6.pdf
PWC	https://paperswithcode.com/paper/blp-boundary-likelihood-pinpointing-networks
Repo
Framework