January 28, 2020

3285 words 16 mins read

Paper Group ANR 922

Feature-based factorized Bilinear Similarity Model for Cold-Start Top-n Item Recommendation. Influence of segmentation on deep iris recognition performance. Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images. Syntax-aware Multilingual Semantic Role Labeling. Projection pursuit with applications to sc …

Feature-based factorized Bilinear Similarity Model for Cold-Start Top-n Item Recommendation


Title	Feature-based factorized Bilinear Similarity Model for Cold-Start Top-n Item Recommendation
Authors	Mohit Sharma, Jiayu Zhou, Junling Hu, George Karypis
Abstract	Recommending new items to existing users has remained a challenging problem due to absence of user’s past preferences for these items. The user personalized non-collaborative methods based on item features can be used to address this item cold-start problem. These methods rely on similarities between the target item and user’s previous preferred items. While computing similarities based on item features, these methods overlook the interactions among the features of the items and consider them independently. Modeling interactions among features can be helpful as some features, when considered together, provide a stronger signal on the relevance of an item when compared to case where features are considered independently. To address this important issue, in this work we introduce the Feature-based factorized Bilinear Similarity Model (FBSM), which learns factorized bilinear similarity model for TOP-n recommendation of new items, given the information about items preferred by users in past as well as the features of these items. We carry out extensive empirical evaluations on benchmark datasets, and we find that the proposed FBSM approach improves upon traditional non-collaborative methods in terms of recommendation performance. Moreover, the proposed approach also learns insightful interactions among item features from data, which lead to deep understanding on how these interactions contribute to personalized recommendation.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.11799v1
PDF	http://arxiv.org/pdf/1904.11799v1.pdf
PWC	https://paperswithcode.com/paper/190411799
Repo
Framework

Influence of segmentation on deep iris recognition performance


Title	Influence of segmentation on deep iris recognition performance
Authors	Juš Lozej, Dejan Štepec, Vitomir Štruc, Peter Peer
Abstract	Despite the rise of deep learning in numerous areas of computer vision and image processing, iris recognition has not benefited considerably from these trends so far. Most of the existing research on deep iris recognition is focused on new models for generating discriminative and robust iris representations and relies on methodologies akin to traditional iris recognition pipelines. Hence, the proposed models do not approach iris recognition in an end-to-end manner, but rather use standard heuristic iris segmentation (and unwrapping) techniques to produce normalized inputs for the deep learning models. However, because deep learning is able to model very complex data distributions and nonlinear data changes, an obvious question arises. How important is the use of traditional segmentation methods in a deep learning setting? To answer this question, we present in this paper an empirical analysis of the impact of iris segmentation on the performance of deep learning models using a simple two stage pipeline consisting of a segmentation and a recognition step. We evaluate how the accuracy of segmentation influences recognition performance but also examine if segmentation is needed at all. We use the CASIA Thousand and SBVPI datasets for the experiments and report several interesting findings.
Tasks	Iris Recognition, Iris Segmentation
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10431v1
PDF	http://arxiv.org/pdf/1901.10431v1.pdf
PWC	https://paperswithcode.com/paper/influence-of-segmentation-on-deep-iris
Repo
Framework

Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images


Title	Inception Architecture and Residual Connections in Classification of Breast Cancer Histology Images
Authors	Mohammad Ibrahim Sarker, Hyongsuk Kim, Denis Tarasov, Dinar Akhmetzanov
Abstract	This paper presents results of applying Inception v4 deep convolutional neural network to ICIAR-2018 Breast Cancer Classification Grand Challenge, part a. The Challenge task is to classify breast cancer biopsy results, presented in form of hematoxylin and eosin stained images. Breast cancer classification is of primary interest to the medical practitioners and thus binary classification of breast cancer images have been under investigation by many researchers, but multi-class categorization of histology breast images have been challenging due to the subtle differences among the categories. In this work extensive data augmentation is conducted to reduce overfitting and effectiveness of committee of several Inception v4 networks is studied. We report 89% accuracy on 4 class classification task and 93.7% on carcinoma/non-carcinoma two class classification task using our test set of 80 images.
Tasks	Classification Of Breast Cancer Histology Images, Data Augmentation
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04619v1
PDF	https://arxiv.org/pdf/1912.04619v1.pdf
PWC	https://paperswithcode.com/paper/inception-architecture-and-residual
Repo
Framework

Syntax-aware Multilingual Semantic Role Labeling


Title	Syntax-aware Multilingual Semantic Role Labeling
Authors	Shexia He, Zuchao Li, Hai Zhao
Abstract	Recently, semantic role labeling (SRL) has earned a series of success with even higher performance improvements, which can be mainly attributed to syntactic integration and enhanced word representation. However, most of these efforts focus on English, while SRL on multiple languages more than English has received relatively little attention so that is kept underdevelopment. Thus this paper intends to fill the gap on multilingual SRL with special focus on the impact of syntax and contextualized word representation. Unlike existing work, we propose a novel method guided by syntactic rule to prune arguments, which enables us to integrate syntax into multilingual SRL model simply and effectively. We present a unified SRL model designed for multiple languages together with the proposed uniform syntax enhancement. Our model achieves new state-of-the-art results on the CoNLL-2009 benchmarks of all seven languages. Besides, we pose a discussion on the syntactic role among different languages and verify the effectiveness of deep enhanced representation for multilingual SRL.
Tasks	Semantic Role Labeling
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00310v3
PDF	https://arxiv.org/pdf/1909.00310v3.pdf
PWC	https://paperswithcode.com/paper/syntax-aware-multilingual-semantic-role
Repo
Framework

Projection pursuit with applications to scRNA sequencing data


Title	Projection pursuit with applications to scRNA sequencing data
Authors	Elvis Cui, Heather Zhou
Abstract	In this paper, we explore the limitations of PCA as a dimension reduction technique and study its extension, projection pursuit (PP), which is a broad class of linear dimension reduction methods. We first discuss the relevant concepts and theorems and then apply PCA and PP (with negative standardized Shannon’s entropy as the projection index) on single cell RNA sequencing data.
Tasks	Dimensionality Reduction
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07602v1
PDF	https://arxiv.org/pdf/1912.07602v1.pdf
PWC	https://paperswithcode.com/paper/projection-pursuit-with-applications-to-scrna
Repo
Framework

Three-dimensional Radial Visualization of High-dimensional Continuous or Discrete Data


Title	Three-dimensional Radial Visualization of High-dimensional Continuous or Discrete Data
Authors	Fan Dai, Yifan Zhu, Ranjan Maitra
Abstract	This paper develops methodology for 3D radial visualization of high-dimensional datasets. Our display engine is called RadViz3D and extends the classic RadViz that visualizes multivariate data in the 2D plane by mapping every record to a point inside the unit circle. The classic RadViz display has equally-spaced anchor points on the unit circle, with each of them associated with an attribute or feature of the dataset. RadViz3D obtains equi-spaced anchor points exactly for the five Platonic solids and approximately for the other cases via a Fibonacci grid. We show that distributing anchor points at least approximately uniformly on the 3D unit sphere provides a better visualization than in 2D. We also propose a Max-Ratio Projection (MRP) method that utilizes the group information in high dimensions to provide distinctive lower-dimensional projections that are then displayed using Radviz3D. Our methodology is extended to datasets with discrete and mixed features where a generalized distributional transform is used in conjuction with copula models before applying MRP and RadViz3D visualization.
Tasks
Published	2019-04-06
URL	http://arxiv.org/abs/1904.06366v1
PDF	http://arxiv.org/pdf/1904.06366v1.pdf
PWC	https://paperswithcode.com/paper/190406366
Repo
Framework

The Wasserstein-Fourier Distance for Stationary Time Series


Title	The Wasserstein-Fourier Distance for Stationary Time Series
Authors	Elsa Cazelles, Arnaud Robert, Felipe Tobar
Abstract	We introduce a novel framework for analysing stationary time series based on optimal transport distances and spectral embeddings. First, we represent time series by their power spectral density (PSD), which summarises the signal energy spread across the Fourier spectrum. Second, we endow the space of PSDs with the Wasserstein distance, which capitalises its unique ability to preserve the geometric information of a set of distributions. These two steps enable us to define the Wasserstein-Fourier (WF) distance, which allows us to compare stationary time series even when they differ in sampling rate, length, magnitude and phase. We analyse the features of WF by blending the properties of the Wasserstein distance and those of the Fourier transform. The proposed WF distance is then used in three sets of key time series applications considering real-world datasets: (i) interpolation of time series leading to data augmentation, (ii) dimensionality reduction via non-linear PCA, and (iii) parametric and non-parametric classification tasks. Our conceptual and experimental findings validate the general concept of using divergences of distributions, especially the Wasserstein distance, to analyse time series through comparing their spectral representations.
Tasks	Data Augmentation, Dimensionality Reduction, Time Series
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05509v1
PDF	https://arxiv.org/pdf/1912.05509v1.pdf
PWC	https://paperswithcode.com/paper/the-wasserstein-fourier-distance-for
Repo
Framework

On Symbiosis of Attribute Prediction and Semantic Segmentation


Title	On Symbiosis of Attribute Prediction and Semantic Segmentation
Authors	Mahdi M. Kalayeh, Mubarak Shah
Abstract	In this paper, we propose to employ semantic segmentation to improve person-related attribute prediction. The core idea lies in the fact that the probability of an attribute to appear in an image is far from being uniform in the spatial domain. We build our attribute prediction model jointly with a deep semantic segmentation network. This harnesses the localization cues learned by the semantic segmentation to guide the attention of the attribute prediction to the regions where different attributes naturally show up. Therefore, in addition to prediction, we are able to localize the attributes despite merely having access to image-level labels (weak supervision) during training. We first propose semantic segmentation-based pooling and gating, respectively denoted as SSP and SSG. In the former, the estimated segmentation masks are used to pool the final activations of the attribute prediction network, from multiple semantically homogeneous regions. In SSG, the same idea is applied to the intermediate layers of the network. SSP and SSG, while effective, impose heavy memory utilization since each channel of the activations is pooled/gated with all the semantic segmentation masks. To circumvent this, we propose Symbiotic Augmentation (SA), where we learn only one mask per activation channel. SA allows the model to either pick one, or combine (weighted superposition) multiple semantic maps, in order to generate the proper mask for each channel. SA simultaneously applies the same mechanism to the reverse problem by leveraging output logits of attribute prediction to guide the semantic segmentation task. We evaluate our proposed methods for facial attributes on CelebA and LFWA datasets, while benchmarking WIDER Attribute and Berkeley Attributes of People for whole body attributes. Our proposed methods achieve superior results compared to the previous works.
Tasks	Semantic Segmentation
Published	2019-11-23
URL	https://arxiv.org/abs/1911.11612v1
PDF	https://arxiv.org/pdf/1911.11612v1.pdf
PWC	https://paperswithcode.com/paper/on-symbiosis-of-attribute-prediction-and
Repo
Framework

Incremental extraction of a NoSQL database model using an MDA-based process


Title	Incremental extraction of a NoSQL database model using an MDA-based process
Authors	Amal Ait Brahim, Rabah Tighilt Ferhat, Gilles Zurfluh
Abstract	In recent years, the need to use NoSQL systems to store and exploit big data has been steadily increasing. Most of these systems are characterized by the property “schema less” which means absence of the data model when creating a database. This property brings an undeniable flexibility by allowing the evolution of the model during the exploitation of the base. However, the expression of queries requires a precise knowledge of this model. In this paper, we propose an incremental process to extract the model while operating the document-oriented NoSQL database. To do this, we use the Model Driven Architecture (MDA) that provides a formal framework for automatic model transformation. From the insert, delete and update queries executed on the database, we propose formal transformation rules with QVT to generate the physical model of the NoSQL database. An experimentation of the extraction process was performed on a medical application.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01270v1
PDF	https://arxiv.org/pdf/1911.01270v1.pdf
PWC	https://paperswithcode.com/paper/incremental-extraction-of-a-nosql-database
Repo
Framework

Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars


Title	Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars
Authors	Lucas C. Possatti, Rânik Guidolini, Vinicius B. Cardoso, Rodrigo F. Berriel, Thiago M. Paixão, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos
Abstract	Autonomous terrestrial vehicles must be capable of perceiving traffic lights and recognizing their current states to share the streets with human drivers. Most of the time, human drivers can easily identify the relevant traffic lights. To deal with this issue, a common solution for autonomous cars is to integrate recognition with prior maps. However, additional solution is required for the detection and recognition of the traffic light. Deep learning techniques have showed great performance and power of generalization including traffic related problems. Motivated by the advances in deep learning, some recent works leveraged some state-of-the-art deep detectors to locate (and further recognize) traffic lights from 2D camera images. However, none of them combine the power of the deep learning-based detectors with prior maps to recognize the state of the relevant traffic lights. Based on that, this work proposes to integrate the power of deep learning-based detection with the prior maps used by our car platform IARA (acronym for Intelligent Autonomous Robotic Automobile) to recognize the relevant traffic lights of predefined routes. The process is divided in two phases: an offline phase for map construction and traffic lights annotation; and an online phase for traffic light recognition and identification of the relevant ones. The proposed system was evaluated on five test cases (routes) in the city of Vit'oria, each case being composed of a video sequence and a prior map with the relevant traffic lights for the route. Results showed that the proposed technique is able to correctly identify the relevant traffic light along the trajectory.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.11886v1
PDF	https://arxiv.org/pdf/1906.11886v1.pdf
PWC	https://paperswithcode.com/paper/traffic-light-recognition-using-deep-learning
Repo
Framework

Multi-focus Image Fusion Based on Similarity Characteristics


Title	Multi-focus Image Fusion Based on Similarity Characteristics
Authors	Ya-Qiong Zhang, Xiao-Jun Wu, Hui Li
Abstract	A novel multi-focus image fusion algorithm performed in spatial domain based on similarity characteristics is proposed incorporating with region segmentation. In this paper, a new similarity measure is developed based on the structural similarity (SSIM) index, which is more suitable for multi-focus image segmentation. Firstly, the SSNSIM map is calculated between two input images. Then we segment the SSNSIM map using watershed method, and merge the small homogeneous regions with fuzzy c-means clustering algorithm (FCM). For three source images, a joint region segmentation method based on segmentation of two images is used to obtain the final segmentation result. Finally, the corresponding segmented regions of the source images are fused according to their average gradient. The performance of the image fusion method is evaluated by several criteria including spatial frequency, average gradient, entropy, edge retention etc. The evaluation results indicate that the proposed method is effective and has good visual perception.
Tasks	Semantic Segmentation
Published	2019-12-17
URL	https://arxiv.org/abs/1912.07959v1
PDF	https://arxiv.org/pdf/1912.07959v1.pdf
PWC	https://paperswithcode.com/paper/multi-focus-image-fusion-based-on-similarity
Repo
Framework

C3DVQA: Full-Reference Video Quality Assessment with 3D Convolutional Neural Network


Title	C3DVQA: Full-Reference Video Quality Assessment with 3D Convolutional Neural Network
Authors	Munan Xu, Junming Chen, Haiqiang Wang, Shan Liu, Ge Li, Zhiqiang Bai
Abstract	Traditional video quality assessment (VQA) methods evaluate localized picture quality and video score is predicted by temporally aggregating frame scores. However, video quality exhibits different characteristics from static image quality due to the existence of temporal masking effects. In this paper, we present a novel architecture, namely C3DVQA, that uses Convolutional Neural Network with 3D kernels (C3D) for full-reference VQA task. C3DVQA combines feature learning and score pooling into one spatiotemporal feature learning process. We use 2D convolutional layers to extract spatial features and 3D convolutional layers to learn spatiotemporal features. We empirically found that 3D convolutional layers are capable to capture temporal masking effects of videos. We evaluated the proposed method on the LIVE and CSIQ datasets. The experimental results demonstrate that the proposed method achieves the state-of-the-art performance.
Tasks	Video Quality Assessment, Visual Question Answering
Published	2019-10-30
URL	https://arxiv.org/abs/1910.13646v2
PDF	https://arxiv.org/pdf/1910.13646v2.pdf
PWC	https://paperswithcode.com/paper/c3dvqa-full-reference-video-quality
Repo
Framework

Multi-Item Mechanisms without Item-Independence: Learnability via Robustness


Title	Multi-Item Mechanisms without Item-Independence: Learnability via Robustness
Authors	Johaness Brustle, Yang Cai, Constantinos Daskalakis
Abstract	We study the sample complexity of learning revenue-optimal multi-item auctions. We obtain the first set of positive results that go beyond the standard but unrealistic setting of item-independence. In particular, we consider settings where bidders’ valuations are drawn from correlated distributions that can be captured by Markov Random Fields or Bayesian Networks – two of the most prominent graphical models. We establish parametrized sample complexity bounds for learning an up-to-$\varepsilon$ optimal mechanism in both models, which scale polynomially in the size of the model, i.e. the number of items and bidders, and only exponential in the natural complexity measure of the model, namely either the largest in-degree (for Bayesian Networks) or the size of the largest hyper-edge (for Markov Random Fields). We obtain our learnability results through a novel and modular framework that involves first proving a robustness theorem. We show that, given only “approximate distributions” for bidder valuations, we can learn a mechanism whose revenue is nearly optimal simultaneously for all “true distributions” that are close to the ones we were given in Prokhorov distance. Thus, to learn a good mechanism, it suffices to learn approximate distributions. When item values are independent, learning in Prokhorov distance is immediate, hence our framework directly implies the main result of Gonczarowski and Weinberg \cite{GonczarowskiW18}. When item values are sampled from more general graphical models, we combine our robustness theorem with novel sample complexity results for learning Markov Random Fields or Bayesian Networks in Prokhorov distance, which may be of independent interest. Finally, in the single-item case, our robustness result can be strengthened to hold under an even weaker distribution distance, the L'evy distance.
Tasks
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02146v1
PDF	https://arxiv.org/pdf/1911.02146v1.pdf
PWC	https://paperswithcode.com/paper/multi-item-mechanisms-without-item
Repo
Framework

Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese


Title	Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese
Authors	William N. Havard, Jean-Pierre Chevrot, Laurent Besacier
Abstract	We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese. Experimental results show that attention focuses on nouns and this behaviour holds true for two very typologically different languages. We also draw parallels between artificial neural attention and human attention and show that neural attention focuses on word endings as it has been theorised for human attention. Finally, we investigate how two visually grounded monolingual models can be used to perform cross-lingual speech-to-speech retrieval. For both languages, the enriched bilingual (speech-image) corpora with part-of-speech tags and forced alignments are distributed to the community for reproducible research.
Tasks
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03052v1
PDF	http://arxiv.org/pdf/1902.03052v1.pdf
PWC	https://paperswithcode.com/paper/models-of-visually-grounded-speech-signal-pay
Repo
Framework

The non-capacitor model of leaky integrate-and-fire $VO_2$ neuron with the thermal mechanism of the membrane potential


Title	The non-capacitor model of leaky integrate-and-fire $VO_2$ neuron with the thermal mechanism of the membrane potential
Authors	A. A. Velichko, M. A. Belyaev, D. V. Ryabokon, S. D. Khanin
Abstract	The study presents a numerical model of leaky integrate-and-fire neuron created on the basis of $VO_2$ switch. The analogue of the membrane potential in the model is the temperature of the switch channel, and the action potential from neighbouring neurons propagates along the substrate in the form of thermal pulses. We simulated the operation of three neurons and demonstrated that the total effect happens due to interference of thermal waves in the region of the neuron switching channel. The thermal mechanism of the threshold function operates due to the effect of electrical switching, and the magnitude (temperature) of the threshold can vary by external voltage. The neuron circuit does not contain capacitor, making it possible to produce a network with a high density of components, and has the potential for 3D integration due to the thermal mechanism of neurons interaction.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1911.02547v1
PDF	https://arxiv.org/pdf/1911.02547v1.pdf
PWC	https://paperswithcode.com/paper/the-non-capacitor-model-of-leaky-integrate
Repo
Framework