January 29, 2020

3198 words 16 mins read

Paper Group ANR 587

End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification. IntentGC: a Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation. MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching. SketchZooms: Deep multi-view descri …

End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification


Title	End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification
Authors	Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu
Abstract	In recent years, speaker verification has primarily performed using deep neural networks that are trained to output embeddings from input features such as spectrograms or Mel-filterbank energies. Studies that design various loss functions, including metric learning have been widely explored. In this study, we propose two end-to-end loss functions for speaker verification using the concept of speaker bases, which are trainable parameters. One loss function is designed to further increase the inter-speaker variation, and the other is designed to conduct the identical concept with hard negative mining. Each speaker basis is designed to represent the corresponding speaker in the process of training deep neural networks. In contrast to the conventional loss functions that can consider only a limited number of speakers included in a mini-batch, the proposed loss functions can consider all the speakers in the training set regardless of the mini-batch composition. In particular, the proposed loss functions enable hard negative mining and calculations of between-speaker variations with consideration of all speakers. Through experiments on VoxCeleb1 and VoxCeleb2 datasets, we confirmed that the proposed loss functions could supplement conventional softmax and center loss functions.
Tasks	Metric Learning, Speaker Verification
Published	2019-02-07
URL	https://arxiv.org/abs/1902.02455v3
PDF	https://arxiv.org/pdf/1902.02455v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-losses-based-on-speaker-basis
Repo
Framework

IntentGC: a Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation


Title	IntentGC: a Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation
Authors	Jun Zhao, Zhou Zhou, Ziyu Guan, Wei Zhao, Wei Ning, Guang Qiu, Xiaofei He
Abstract	The remarkable progress of network embedding has led to state-of-the-art algorithms in recommendation. However, the sparsity of user-item interactions (i.e., explicit preferences) on websites remains a big challenge for predicting users’ behaviors. Although research efforts have been made in utilizing some auxiliary information (e.g., social relations between users) to solve the problem, the existing rich heterogeneous auxiliary relationships are still not fully exploited. Moreover, previous works relied on linearly combined regularizers and suffered parameter tuning. In this work, we collect abundant relationships from common user behaviors and item information, and propose a novel framework named IntentGC to leverage both explicit preferences and heterogeneous relationships by graph convolutional networks. In addition to the capability of modeling heterogeneity, IntentGC can learn the importance of different relationships automatically by the neural model in a nonlinear sense. To apply IntentGC to web-scale applications, we design a faster graph convolutional model named IntentNet by avoiding unnecessary feature interactions. Empirical experiments on two large-scale real-world datasets and online A/B tests in Alibaba demonstrate the superiority of our method over state-of-the-art algorithms.
Tasks	Network Embedding
Published	2019-07-24
URL	https://arxiv.org/abs/1907.12377v1
PDF	https://arxiv.org/pdf/1907.12377v1.pdf
PWC	https://paperswithcode.com/paper/intentgc-a-scalable-graph-convolution
Repo
Framework

MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching


Title	MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching
Authors	Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He
Abstract	Disparity prediction from stereo images is essential to computer vision applications including autonomous driving, 3D model reconstruction, and object detection. To predict accurate disparity map, we propose a novel deep learning architecture for detectingthe disparity map from a rectified pair of stereo images, called MSDC-Net. Our MSDC-Net contains two modules: multi-scale fusion 2D convolution and multi-scale residual 3D convolution modules. The multi-scale fusion 2D convolution module exploits the potential multi-scale features, which extracts and fuses the different scale features by Dense-Net. The multi-scale residual 3D convolution module learns the different scale geometry context from the cost volume which aggregated by the multi-scale fusion 2D convolution module. Experimental results on Scene Flow and KITTI datasets demonstrate that our MSDC-Net significantly outperforms other approaches in the non-occluded region.
Tasks	Autonomous Driving, Object Detection, Stereo Matching, Stereo Matching Hand
Published	2019-04-25
URL	http://arxiv.org/abs/1904.12658v2
PDF	http://arxiv.org/pdf/1904.12658v2.pdf
PWC	https://paperswithcode.com/paper/190412658
Repo
Framework

SketchZooms: Deep multi-view descriptors for matching line drawings


Title	SketchZooms: Deep multi-view descriptors for matching line drawings
Authors	Pablo Navarro, José Ignacio Orlando, Claudio Delrieux, Emmanuel Iarussi
Abstract	Finding point-wise correspondences between images is a long-standing problem in computer vision. Corresponding sketch images is particularly challenging due to the varying nature of human style, projection distortions and viewport changes. In this paper we present a feature descriptor targeting line drawings learned from a 3D shape data set. Our descriptors are designed to locally match image pairs where the object of interest belongs to the same semantic category, yet still differ drastically in shape and projection angle. We build our descriptors by means of a Convolutional Neural Network (CNN) trained in a triplet fashion. The goal is to embed semantically similar anchor points close to one another, and to pull the embeddings of different points far apart. To learn the descriptors space, the network is fed with a succession of zoomed views from the input sketches. We have specifically crafted a data set of synthetic sketches using a non-photorealistic rendering algorithm over a large collection of part-based registered 3D models. Once trained, our network can generate descriptors for every pixel in an input image. Furthermore, our network is able to generalize well to unseen sketches hand-drawn by humans, outperforming state-of-the-art descriptors on the evaluated matching tasks. Our descriptors can be used to obtain sparse and dense correspondences between image pairs. We evaluate our method against a baseline of correspondences data collected from expert designers, in addition to comparisons with descriptors that have been proven effective in sketches. Finally, we demonstrate applications showing the usefulness of our multi-view descriptors.
Tasks
Published	2019-11-29
URL	https://arxiv.org/abs/1912.05019v1
PDF	https://arxiv.org/pdf/1912.05019v1.pdf
PWC	https://paperswithcode.com/paper/sketchzooms-deep-multi-view-descriptors-for
Repo
Framework

edBB: Biometrics and Behavior for Assessing Remote Education


Title	edBB: Biometrics and Behavior for Assessing Remote Education
Authors	Javier Hernandez-Ortega, Roberto Daza, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia
Abstract	We present a platform for student monitoring in remote education consisting of a collection of sensors and software that capture biometric and behavioral data. We define a collection of tasks to acquire behavioral data that can be useful for facing the existing challenges in automatic student monitoring during remote evaluation. Additionally, we release an initial database including data from 20 different users completing these tasks with a set of basic sensors: webcam, microphone, mouse, and keyboard; and also from more advanced sensors: NIR camera, smartwatch, additional RGB cameras, and an EEG band. Information from the computer (e.g. system logs, MAC, IP, or web browsing history) is also stored. During each acquisition session each user completed three different types of tasks generating data of different nature: mouse and keystroke dynamics, face data, and audio data among others. The tasks have been designed with two main goals in mind: i) analyse the capacity of such biometric and behavioral data for detecting anomalies during remote evaluation, and ii) study the capability of these data, i.e. EEG, ECG, or NIR video, for estimating other information about the users such as their attention level, the presence of stress, or their pulse rate.
Tasks	EEG
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04786v1
PDF	https://arxiv.org/pdf/1912.04786v1.pdf
PWC	https://paperswithcode.com/paper/edbb-biometrics-and-behavior-for-assessing
Repo
Framework

Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings


Title	Limit theorems for out-of-sample extensions of the adjacency and Laplacian spectral embeddings
Authors	Keith Levin, Fred Roosta, Minh Tang, Michael W. Mahoney, Carey E. Priebe
Abstract	Graph embeddings, a class of dimensionality reduction techniques designed for relational data, have proven useful in exploring and modeling network structure. Most dimensionality reduction methods allow out-of-sample extensions, by which an embedding can be applied to observations not present in the training set. Applied to graphs, the out-of-sample extension problem concerns how to compute the embedding of a vertex that is added to the graph after an embedding has already been computed. In this paper, we consider the out-of-sample extension problem for two graph embedding procedures: the adjacency spectral embedding and the Laplacian spectral embedding. In both cases, we prove that when the underlying graph is generated according to a latent space model called the random dot product graph, which includes the popular stochastic block model as a special case, an out-of-sample extension based on a least-squares objective obeys a central limit theorem about the true latent position of the out-of-sample vertex. In addition, we prove a concentration inequality for the out-of-sample extension of the adjacency spectral embedding based on a maximum-likelihood objective. Our results also yield a convenient framework in which to analyze trade-offs between estimation accuracy and computational expense, which we explore briefly.
Tasks	Dimensionality Reduction, Graph Embedding
Published	2019-09-29
URL	https://arxiv.org/abs/1910.00423v1
PDF	https://arxiv.org/pdf/1910.00423v1.pdf
PWC	https://paperswithcode.com/paper/limit-theorems-for-out-of-sample-extensions
Repo
Framework

Fourier Spectrum Discrepancies in Deep Network Generated Images


Title	Fourier Spectrum Discrepancies in Deep Network Generated Images
Authors	Tarik Dzanic, Freddie Witherden
Abstract	Advancements in deep generative models such as generative adversarial networks and variational autoencoders have resulted in the ability to generate realistic images that are visually indistinguishable from real images. In this paper, we present an analysis of the high-frequency Fourier modes of real and deep network generated images and the effects of resolution and image compression on these modes. Using this, we propose a detection method based on the frequency spectrum of the images which is able to achieve an accuracy of up to 99.2% in classifying real, Style-GAN generated, and VQ-VAE2 generated images on a dataset of 2000 images with less than 10% training data. Furthermore, we suggest a method for modifying the high-frequency attributes of deep network generated images to mimic real images.
Tasks	Image Compression
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06465v1
PDF	https://arxiv.org/pdf/1911.06465v1.pdf
PWC	https://paperswithcode.com/paper/fourier-spectrum-discrepancies-in-deep
Repo
Framework

Alice’s Adventures in the Markovian World


Title	Alice’s Adventures in the Markovian World
Authors	Zhanzhan Zhao, Haoran Sun
Abstract	This paper proposes an algorithm Alice having no access to the physics law of the environment, which is actually linear with stochastic noise, and learns to make decisions directly online without a training phase or a stable policy as initial input. Neither estimating the system parameters nor the value functions online, the proposed algorithm generalizes one of the most fundamental online learning algorithms Follow-the-Leader into a linear Gauss-Markov process setting, with a regularization term similar to the momentum method in the gradient descent algorithm, and a feasible online constraint inspired by Lyapunov’s Second Theorem. The proposed algorithm is considered as a mirror optimization to the model predictive control. Only knowing the state-action alignment relationship, with the ability to observe every state exactly, a no-regret proof of the algorithm without state noise is given. The analysis of the general linear system with stochastic noise is shown with a sufficient condition for the no-regret proof. The simulations compare the performance of Alice with another recent work and verify the great flexibility of Alice.
Tasks
Published	2019-07-21
URL	https://arxiv.org/abs/1907.08981v1
PDF	https://arxiv.org/pdf/1907.08981v1.pdf
PWC	https://paperswithcode.com/paper/alices-adventures-in-the-markovian-world
Repo
Framework

Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing


Title	Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing
Authors	Chunyang Xiao, Christoph Teichmann, Konstantine Arkoudas
Abstract	While sequence-to-sequence (seq2seq) models achieve state-of-the-art performance in many natural language processing tasks, they can be too slow for real-time applications. One performance bottleneck is predicting the most likely next token over a large vocabulary; methods to circumvent this bottleneck are a current research topic. We focus specifically on using seq2seq models for semantic parsing, where we observe that grammars often exist which specify valid formal representations of utterance semantics. By developing a generic approach for restricting the predictions of a seq2seq model to grammatically permissible continuations, we arrive at a widely applicable technique for speeding up semantic parsing. The technique leads to a 74% speed-up on an in-house dataset with a large vocabulary, compared to the same neural model without grammatical restrictions.
Tasks	Semantic Parsing
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11049v1
PDF	https://arxiv.org/pdf/1907.11049v1.pdf
PWC	https://paperswithcode.com/paper/grammatical-sequence-prediction-for-real-time
Repo
Framework

DefectNET: multi-class fault detection on highly-imbalanced datasets


Title	DefectNET: multi-class fault detection on highly-imbalanced datasets
Authors	N. Anantrasirichai, David Bull
Abstract	As a data-driven method, the performance of deep convolutional neural networks (CNN) relies heavily on training data. The prediction results of traditional networks give a bias toward larger classes, which tend to be the background in the semantic segmentation task. This becomes a major problem for fault detection, where the targets appear very small on the images and vary in both types and sizes. In this paper we propose a new network architecture, DefectNet, that offers multi-class (including but not limited to) defect detection on highly-imbalanced datasets. DefectNet consists of two parallel paths, which are a fully convolutional network and a dilated convolutional network to detect large and small objects respectively. We propose a hybrid loss maximising the usefulness of a dice loss and a cross entropy loss, and we also employ the leaky rectified linear unit (ReLU) to deal with rare occurrence of some targets in training batches. The prediction results show that our DefectNet outperforms state-of-the-art networks for detecting multi-class defects with the average accuracy improvement of approximately 10% on a wind turbine.
Tasks	Fault Detection, Semantic Segmentation
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00863v2
PDF	http://arxiv.org/pdf/1904.00863v2.pdf
PWC	https://paperswithcode.com/paper/defectnet-multi-class-fault-detection-on
Repo
Framework

Towards Human Centered AutoML


Title	Towards Human Centered AutoML
Authors	Florian Pfisterer, Janek Thomas, Bernd Bischl
Abstract	Building models from data is an integral part of the majority of data science workflows. While data scientists are often forced to spend the majority of the time available for a given project on data cleaning and exploratory analysis, the time available to practitioners to build actual models from data is often rather short due to time constraints for a given project. AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. In this position paper, we aim to discuss the impact of the rising popularity of such systems and how a user-centered interface for such systems could look like. More importantly, we also want to point out features that are currently missing in those systems and start to explore better usability of such systems from a data-scientists perspective.
Tasks	AutoML
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02391v1
PDF	https://arxiv.org/pdf/1911.02391v1.pdf
PWC	https://paperswithcode.com/paper/towards-human-centered-automl
Repo
Framework

R$^2$-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images


Title	R$^2$-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images
Authors	Jiangmiao Pang, Cong Li, Jianping Shi, Zhihai Xu, Huajun Feng
Abstract	Recently, the convolutional neural network has brought impressive improvements for object detection. However, detecting tiny objects in large-scale remote sensing images still remains challenging. First, the extreme large input size makes the existing object detection solutions too slow for practical use. Second, the massive and complex backgrounds cause serious false alarms. Moreover, the ultratiny objects increase the difficulty of accurate detection. To tackle these problems, we propose a unified and self-reinforced network called remote sensing region-based convolutional neural network ($\mathcal{R}^2$-CNN), composing of backbone Tiny-Net, intermediate global attention block, and final classifier and detector. Tiny-Net is a lightweight residual structure, which enables fast and powerful features extraction from inputs. Global attention block is built upon Tiny-Net to inhibit false positives. Classifier is then used to predict the existence of targets in each patch, and detector is followed to locate them accurately if available. The classifier and detector are mutually reinforced with end-to-end training, which further speed up the process and avoid false alarms. Effectiveness of $\mathcal{R}^2$-CNN is validated on hundreds of GF-1 images and GF-2 images that are 18 000 $\times$ 18 192 pixels, 2.0-m resolution, and 27 620 $\times$ 29 200 pixels, 0.8-m resolution, respectively. Specifically, we can process a GF-1 image in 29.4 s on Titian X just with single thread. According to our knowledge, no previous solution can detect the tiny object on such huge remote sensing images gracefully. We believe that it is a significant step toward practical real-time remote sensing systems.
Tasks	Object Detection
Published	2019-02-16
URL	http://arxiv.org/abs/1902.06042v3
PDF	http://arxiv.org/pdf/1902.06042v3.pdf
PWC	https://paperswithcode.com/paper/mathcalr2-cnn-fast-tiny-object-detection-in
Repo
Framework

Organ-based Chronological Age Estimation based on 3D MRI Scans


Title	Organ-based Chronological Age Estimation based on 3D MRI Scans
Authors	Karim Armanious, Sherif Abdulatif, Anish Rao Bhaktharaguttu, Thomas Küstner, Tobias Hepp, Sergios Gatidis, Bin Yang
Abstract	Individuals age differently depending on a multitude of different factors such as lifestyle, medical history and genetics. Often, the global chronological age is not indicative of the true ageing process. An organ-based age estimation would yield a more accurate health state assessment. In this work, we propose a new deep learning architecture for organ-based age estimation based on magnetic resonance images (MRI). The proposed network is a 3D convolutional neural network (CNN) with increased depth and width made possible by the hybrid utilization of inception and fire modules. We apply the proposed framework for the tasks of brain and knee age estimation. Quantitative comparisons against concurrent MR-based regression networks and different 2D and 3D data feeding strategies illustrated the superior performance of the proposed work.
Tasks	Age Estimation
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06271v2
PDF	https://arxiv.org/pdf/1910.06271v2.pdf
PWC	https://paperswithcode.com/paper/organ-based-age-estimation-based-on-3d-mri
Repo
Framework

Self-Paced Deep Regression Forests for Facial Age Estimation


Title	Self-Paced Deep Regression Forests for Facial Age Estimation
Authors	Shijie Ai, Yazhou Ren, Lili Pan
Abstract	Facial age estimation is an important and challenging problem in computer vision. Existing approaches usually employ deep neural networks (DNNs) to fit the mapping from facial features to age, even though there exist some noisy and confusing samples. We argue that it is more desirable to distinguish noisy and confusing facial images from regular ones, and alleviate the interference arising from them. To this end, we propose self-paced deep regression forests (SP-DRFs) – a gradual learning DNNs framework for age estimation. As the model is learned gradually, from simplicity to complexity, it tends to emphasize more on reliable samples and avoid bad local minima. Moreover, the proposed capped-likelihood function helps to exclude noisy samples in training, rendering our SP-DRFs significantly more robust. We demonstrate the efficacy of SP-DRFs on Morph II and FG-NET datasets, where our model achieves state-of-the-art performance.
Tasks	Age Estimation
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03244v3
PDF	https://arxiv.org/pdf/1910.03244v3.pdf
PWC	https://paperswithcode.com/paper/self-paced-deep-regression-forests-for-facial
Repo
Framework

TwistBytes – Hierarchical Classification at GermEval 2019: walking the fine line (of recall and precision)


Title	TwistBytes – Hierarchical Classification at GermEval 2019: walking the fine line (of recall and precision)
Authors	Fernando Benites
Abstract	We present here our approach to the GermEval 2019 Task 1 - Shared Task on hierarchical classification of German blurbs. We achieved first place in the hierarchical subtask B and second place on the root node, flat classification subtask A. In subtask A, we applied a simple multi-feature TF-IDF extraction method using different n-gram range and stopword removal, on each feature extraction module. The classifier on top was a standard linear SVM. For the hierarchical classification, we used a local approach, which was more light-weighted but was similar to the one used in subtask A. The key point of our approach was the application of a post-processing to cope with the multi-label aspect of the task, increasing the recall but not surpassing the precision measure score.
Tasks	Hierarchical Text Classification of Blurbs (GermEval 2019)
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06493v1
PDF	https://arxiv.org/pdf/1908.06493v1.pdf
PWC	https://paperswithcode.com/paper/twistbytes-hierarchical-classification-at
Repo
Framework