May 6, 2019

3055 words 15 mins read

Paper Group ANR 232

How priors of initial hyperparameters affect Gaussian process regression models. The Yahoo Query Treebank, V. 1.0. Robot Dream. Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation. Going Deeper for Multilingual Visual Sentiment Detection. Lipreading with Long Short-Term Memory. Restricted Strong Convexity Implies …

How priors of initial hyperparameters affect Gaussian process regression models


Title	How priors of initial hyperparameters affect Gaussian process regression models
Authors	Zexun Chen, Bo Wang
Abstract	The hyperparameters in Gaussian process regression (GPR) model with a specified kernel are often estimated from the data via the maximum marginal likelihood. Due to the non-convexity of marginal likelihood with respect to the hyperparameters, the optimization may not converge to the global maxima. A common approach to tackle this issue is to use multiple starting points randomly selected from a specific prior distribution. As a result the choice of prior distribution may play a vital role in the predictability of this approach. However, there exists little research in the literature to study the impact of the prior distributions on the hyperparameter estimation and the performance of GPR. In this paper, we provide the first empirical study on this problem using simulated and real data experiments. We consider different types of priors for the initial values of hyperparameters for some commonly used kernels and investigate the influence of the priors on the predictability of GPR models. The results reveal that, once a kernel is chosen, different priors for the initial hyperparameters have no significant impact on the performance of GPR prediction, despite that the estimates of the hyperparameters are very different to the true values in some cases.
Tasks
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07906v2
PDF	http://arxiv.org/pdf/1605.07906v2.pdf
PWC	https://paperswithcode.com/paper/how-priors-of-initial-hyperparameters-affect
Repo
Framework

The Yahoo Query Treebank, V. 1.0


Title	The Yahoo Query Treebank, V. 1.0
Authors	Yuval Pinter, Roi Reichart, Idan Szpektor
Abstract	A description and annotation guidelines for the Yahoo Webscope release of Query Treebank, Version 1.0, May 2016.
Tasks
Published	2016-05-10
URL	http://arxiv.org/abs/1605.02945v2
PDF	http://arxiv.org/pdf/1605.02945v2.pdf
PWC	https://paperswithcode.com/paper/the-yahoo-query-treebank-v-10
Repo
Framework

Robot Dream


Title	Robot Dream
Authors	Alexander Tchitchigin, Max Talanov, Larisa Safina, Manuel Mazzara
Abstract	In this position paper we present a novel approach to neurobiologically plausible implementation of emotional reactions and behaviors for real-time autonomous robotic systems. The working metaphor we use is the “day” and “night” phases of mammalian life. During the “day” phase a robotic system stores the inbound information and is controlled by a light-weight rule-based system in real time. In contrast to that, during the “night” phase the stored information is been transferred to the supercomputing system to update the realistic neural network: emotional and behavioral strategies.
Tasks
Published	2016-03-09
URL	http://arxiv.org/abs/1603.03007v1
PDF	http://arxiv.org/pdf/1603.03007v1.pdf
PWC	https://paperswithcode.com/paper/robot-dream
Repo
Framework

Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation


Title	Recurrent Fully Convolutional Neural Networks for Multi-slice MRI Cardiac Segmentation
Authors	Rudra P K Poudel, Pablo Lamata, Giovanni Montana
Abstract	In cardiac magnetic resonance imaging, fully-automatic segmentation of the heart enables precise structural and functional measurements to be taken, e.g. from short-axis MR images of the left-ventricle. In this work we propose a recurrent fully-convolutional network (RFCN) that learns image representations from the full stack of 2D slices and has the ability to leverage inter-slice spatial dependences through internal memory units. RFCN combines anatomical detection and segmentation into a single architecture that is trained end-to-end thus significantly reducing computational time, simplifying the segmentation pipeline, and potentially enabling real-time applications. We report on an investigation of RFCN using two datasets, including the publicly available MICCAI 2009 Challenge dataset. Comparisons have been carried out between fully convolutional networks and deep restricted Boltzmann machines, including a recurrent version that leverages inter-slice spatial correlation. Our studies suggest that RFCN produces state-of-the-art results and can substantially improve the delineation of contours near the apex of the heart.
Tasks	Cardiac Segmentation
Published	2016-08-13
URL	http://arxiv.org/abs/1608.03974v1
PDF	http://arxiv.org/pdf/1608.03974v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-fully-convolutional-neural-networks
Repo
Framework

Going Deeper for Multilingual Visual Sentiment Detection


Title	Going Deeper for Multilingual Visual Sentiment Detection
Authors	Brendan Jou, Shih-Fu Chang
Abstract	This technical report details several improvements to the visual concept detector banks built on images from the Multilingual Visual Sentiment Ontology (MVSO). The detector banks are trained to detect a total of 9,918 sentiment-biased visual concepts from six major languages: English, Spanish, Italian, French, German and Chinese. In the original MVSO release, adjective-noun pair (ANP) detectors were trained for the six languages using an AlexNet-styled architecture by fine-tuning from DeepSentiBank. Here, through a more extensive set of experiments, parameter tuning, and training runs, we detail and release higher accuracy models for detecting ANPs across six languages from the same image pool and setting as in the original release using a more modern architecture, GoogLeNet, providing comparable or better performance with reduced network parameter cost. In addition, since the image pool in MVSO can be corrupted by user noise from social interactions, we partitioned out a sub-corpus of MVSO images based on tag-restricted queries for higher fidelity labels. We show that as a result of these higher fidelity labels, higher performing AlexNet-styled ANP detectors can be trained using the tag-restricted image subset as compared to the models in full corpus. We release all these newly trained models for public research use along with the list of tag-restricted images from the MVSO dataset.
Tasks
Published	2016-05-30
URL	http://arxiv.org/abs/1605.09211v1
PDF	http://arxiv.org/pdf/1605.09211v1.pdf
PWC	https://paperswithcode.com/paper/going-deeper-for-multilingual-visual
Repo
Framework

Lipreading with Long Short-Term Memory


Title	Lipreading with Long Short-Term Memory
Authors	Michael Wand, Jan Koutník, Jürgen Schmidhuber
Abstract	Lipreading, i.e. speech recognition from visual-only recordings of a speaker’s face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward and recurrent neural network layers (namely Long Short-Term Memory; LSTM) are stacked to form a single structure which is trained by back-propagating error gradients through all the layers. The performance of such a stacked network was experimentally evaluated and compared to a standard Support Vector Machine classifier using conventional computer vision features (Eigenlips and Histograms of Oriented Gradients). The evaluation was performed on data from 19 speakers of the publicly available GRID corpus. With 51 different words to classify, we report a best word accuracy on held-out evaluation speakers of 79.6% using the end-to-end neural network-based solution (11.6% improvement over the best feature-based solution evaluated).
Tasks	Lipreading, Speech Recognition
Published	2016-01-29
URL	http://arxiv.org/abs/1601.08188v1
PDF	http://arxiv.org/pdf/1601.08188v1.pdf
PWC	https://paperswithcode.com/paper/lipreading-with-long-short-term-memory
Repo
Framework

Restricted Strong Convexity Implies Weak Submodularity


Title	Restricted Strong Convexity Implies Weak Submodularity
Authors	Ethan R. Elenberg, Rajiv Khanna, Alexandros G. Dimakis, Sahand Negahban
Abstract	We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.
Tasks	Feature Selection
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00804v2
PDF	http://arxiv.org/pdf/1612.00804v2.pdf
PWC	https://paperswithcode.com/paper/restricted-strong-convexity-implies-weak
Repo
Framework

Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening


Title	Registering large volume serial-section electron microscopy image sets for neural circuit reconstruction using FFT signal whitening
Authors	Arthur W. Wetzel, Jennifer Bakal, Markus Dittrich, David G. C. Hildebrand, Josh L. Morgan, Jeff W. Lichtman
Abstract	The detailed reconstruction of neural anatomy for connectomics studies requires a combination of resolution and large three-dimensional data capture provided by serial section electron microscopy (ssEM). The convergence of high throughput ssEM imaging and improved tissue preparation methods now allows ssEM capture of complete specimen volumes up to cubic millimeter scale. The resulting multi-terabyte image sets span thousands of serial sections and must be precisely registered into coherent volumetric forms in which neural circuits can be traced and segmented. This paper introduces a Signal Whitening Fourier Transform Image Registration approach (SWiFT-IR) under development at the Pittsburgh Supercomputing Center and its use to align mouse and zebrafish brain datasets acquired using the wafer mapper ssEM imaging technology recently developed at Harvard University. Unlike other methods now used for ssEM registration, SWiFT-IR modifies its spatial frequency response during image matching to maximize a signal-to-noise measure used as its primary indicator of alignment quality. This alignment signal is more robust to rapid variations in biological content and unavoidable data distortions than either phase-only or standard Pearson correlation, thus allowing more precise alignment and statistical confidence. These improvements in turn enable an iterative registration procedure based on projections through multiple sections rather than more typical adjacent-pair matching methods. This projection approach, when coupled with known anatomical constraints and iteratively applied in a multi-resolution pyramid fashion, drives the alignment into a smooth form that properly represents complex and widely varying anatomical content such as the full cross-section zebrafish data.
Tasks	Image Registration
Published	2016-12-14
URL	http://arxiv.org/abs/1612.04787v1
PDF	http://arxiv.org/pdf/1612.04787v1.pdf
PWC	https://paperswithcode.com/paper/registering-large-volume-serial-section
Repo
Framework

Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-spectral Hallucination and Low-rank Embedding


Title	Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-spectral Hallucination and Low-rank Embedding
Authors	Jose Lezama, Qiang Qiu, Guillermo Sapiro
Abstract	Surveillance cameras today often capture NIR (near infrared) images in low-light environments. However, most face datasets accessible for training and verification are only collected in the VIS (visible light) spectrum. It remains a challenging problem to match NIR to VIS face images due to the different light spectrum. Recently, breakthroughs have been made for VIS face recognition by applying deep learning on a huge amount of labeled VIS face samples. The same deep learning approach cannot be simply applied to NIR face recognition for two main reasons: First, much limited NIR face images are available for training compared to the VIS spectrum. Second, face galleries to be matched are mostly available only in the VIS spectrum. In this paper, we propose an approach to extend the deep learning breakthrough for VIS face recognition to the NIR spectrum, without retraining the underlying deep models that see only VIS faces. Our approach consists of two core components, cross-spectral hallucination and low-rank embedding, to optimize respectively input and output of a VIS deep model for cross-spectral face recognition. Cross-spectral hallucination produces VIS faces from NIR images through a deep learning approach. Low-rank embedding restores a low-rank structure for faces deep features across both NIR and VIS spectrum. We observe that it is often equally effective to perform hallucination to input NIR images or low-rank embedding to output deep features for a VIS deep model for cross-spectral recognition. When hallucination and low-rank embedding are deployed together, we observe significant further improvement; we obtain state-of-the-art accuracy on the CASIA NIR-VIS v2.0 benchmark, without the need at all to re-train the recognition system.
Tasks	Face Recognition
Published	2016-11-21
URL	http://arxiv.org/abs/1611.06638v1
PDF	http://arxiv.org/pdf/1611.06638v1.pdf
PWC	https://paperswithcode.com/paper/not-afraid-of-the-dark-nir-vis-face
Repo
Framework

From Monocular SLAM to Autonomous Drone Exploration


Title	From Monocular SLAM to Autonomous Drone Exploration
Authors	Lukas von Stumberg, Vladyslav Usenko, Jakob Engel, Jörg Stückler, Daniel Cremers
Abstract	Micro aerial vehicles (MAVs) are strongly limited in their payload and power capacity. In order to implement autonomous navigation, algorithms are therefore desirable that use sensory equipment that is as small, low-weight, and low-power consuming as possible. In this paper, we propose a method for autonomous MAV navigation and exploration using a low-cost consumer-grade quadrocopter equipped with a monocular camera. Our vision-based navigation system builds on LSD-SLAM which estimates the MAV trajectory and a semi-dense reconstruction of the environment in real-time. Since LSD-SLAM only determines depth at high gradient pixels, texture-less areas are not directly observed so that previous exploration methods that assume dense map information cannot directly be applied. We propose an obstacle mapping and exploration approach that takes the properties of our semi-dense monocular SLAM system into account. In experiments, we demonstrate our vision-based autonomous navigation and exploration system with a Parrot Bebop MAV.
Tasks	Autonomous Navigation
Published	2016-09-26
URL	http://arxiv.org/abs/1609.07835v3
PDF	http://arxiv.org/pdf/1609.07835v3.pdf
PWC	https://paperswithcode.com/paper/from-monocular-slam-to-autonomous-drone
Repo
Framework

On the convergence rate of the three operator splitting scheme


Title	On the convergence rate of the three operator splitting scheme
Authors	Fabian Pedregosa
Abstract	The three operator splitting scheme was recently proposed by [Davis and Yin, 2015] as a method to optimize composite objective functions with one convex smooth term and two convex (possibly non-smooth) terms for which we have access to their proximity operator. In this short note we provide an alternative proof for the sublinear rate of convergence of this method.
Tasks
Published	2016-10-25
URL	http://arxiv.org/abs/1610.07830v4
PDF	http://arxiv.org/pdf/1610.07830v4.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-rate-of-the-three-operator
Repo
Framework

Memory and Information Processing in Recurrent Neural Networks


Title	Memory and Information Processing in Recurrent Neural Networks
Authors	Alireza Goudarzi, Sarah Marzen, Peter Banda, Guy Feldman, Christof Teuscher, Darko Stefanovic
Abstract	Recurrent neural networks (RNN) are simple dynamical systems whose computational power has been attributed to their short-term memory. Short-term memory of RNNs has been previously studied analytically only for the case of orthogonal networks, and only under annealed approximation, and uncorrelated input. Here for the first time, we present an exact solution to the memory capacity and the task-solving performance as a function of the structure of a given network instance, enabling direct determination of the function–structure relation in RNNs. We calculate the memory capacity for arbitrary networks with exponentially correlated input and further related it to the performance of the system on signal processing tasks in a supervised learning setup. We compute the expected error and the worst-case error bound as a function of the spectra of the network and the correlation structure of its inputs and outputs. Our results give an explanation for learning and generalization of task solving using short-term memory, which is crucial for building alternative computer architectures using physical phenomena based on the short-term memory principle.
Tasks
Published	2016-04-23
URL	http://arxiv.org/abs/1604.06929v1
PDF	http://arxiv.org/pdf/1604.06929v1.pdf
PWC	https://paperswithcode.com/paper/memory-and-information-processing-in-1
Repo
Framework

Deeply Exploit Depth Information for Object Detection


Title	Deeply Exploit Depth Information for Object Detection
Authors	Saihui Hou, Zilei Wang, Feng Wu
Abstract	This paper addresses the issue on how to more effectively coordinate the depth with RGB aiming at boosting the performance of RGB-D object detection. Particularly, we investigate two primary ideas under the CNN model: property derivation and property fusion. Firstly, we propose that the depth can be utilized not only as a type of extra information besides RGB but also to derive more visual properties for comprehensively describing the objects of interest. So a two-stage learning framework consisting of property derivation and fusion is constructed. Here the properties can be derived either from the provided color/depth or their pairs (e.g. the geometry contour adopted in this paper). Secondly, we explore the fusion method of different properties in feature learning, which is boiled down to, under the CNN model, from which layer the properties should be fused together. The analysis shows that different semantic properties should be learned separately and combined before passing into the final classifier. Actually, such a detection way is in accordance with the mechanism of the primary neural cortex (V1) in brain. We experimentally evaluate the proposed method on the challenging dataset, and have achieved state-of-the-art performance.
Tasks	Object Detection
Published	2016-05-08
URL	http://arxiv.org/abs/1605.02260v1
PDF	http://arxiv.org/pdf/1605.02260v1.pdf
PWC	https://paperswithcode.com/paper/deeply-exploit-depth-information-for-object
Repo
Framework

Emergence of linguistic laws in human voice


Title	Emergence of linguistic laws in human voice
Authors	Ivan Gonzalez Torre, Bartolo Luque, Lucas Lacasa, Jordi Luque, Antoni Hernandez-Fernandez
Abstract	Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of statistical patterns of language in acoustics are biased by the arbitrary, language-dependent segmentation of the signal, and virtually precludes the possibility of making comparative studies between human voice and other animal communication systems. Here we bridge this gap by proposing a method that allows to measure such patterns in acoustic signals of arbitrary origin, without needs to have access to the language corpus underneath. The method has been applied to six different human languages, recovering successfully some well-known laws of human communication at timescales even below the phoneme and finding yet another link between complexity and criticality in a biological system. These methods further pave the way for new comparative studies in animal communication or the analysis of signals of unknown code.
Tasks
Published	2016-10-09
URL	http://arxiv.org/abs/1610.02736v1
PDF	http://arxiv.org/pdf/1610.02736v1.pdf
PWC	https://paperswithcode.com/paper/emergence-of-linguistic-laws-in-human-voice
Repo
Framework

Combining Generative and Discriminative Neural Networks for Sleep Stages Classification


Title	Combining Generative and Discriminative Neural Networks for Sleep Stages Classification
Authors	Endang Purnama Giri, Mohamad Ivan Fanany, Aniati Murni Arymurthy
Abstract	Sleep stages pattern provides important clues in diagnosing the presence of sleep disorder. By analyzing sleep stages pattern and extracting its features from EEG, EOG, and EMG signals, we can classify sleep stages. This study presents a novel classification model for predicting sleep stages with a high accuracy. The main idea is to combine the generative capability of Deep Belief Network (DBN) with a discriminative ability and sequence pattern recognizing capability of Long Short-term Memory (LSTM). We use DBN that is treated as an automatic higher level features generator. The input to DBN is 28 “handcrafted” features as used in previous sleep stages studies. We compared our method with other techniques which combined DBN with Hidden Markov Model (HMM).In this study, we exploit the sequence or time series characteristics of sleep dataset. To the best of our knowledge, most of the present sleep analysis from polysomnogram relies only on single instanced label (nonsequence) for classification. In this study, we used two datasets: an open data set that is treated as a benchmark; the other dataset is our sleep stages dataset (available for download) to verify the results further. Our experiments showed that the combination of DBN with LSTM gives better overall accuracy 98.75% (Fscore=0.9875) for benchmark dataset and 98.94% (Fscore=0.9894) for MKG dataset. This result is better than the state of the art of sleep stages classification that was 91.31%.
Tasks	EEG, Time Series
Published	2016-10-06
URL	http://arxiv.org/abs/1610.01741v1
PDF	http://arxiv.org/pdf/1610.01741v1.pdf
PWC	https://paperswithcode.com/paper/combining-generative-and-discriminative
Repo
Framework