July 27, 2019

3078 words 15 mins read

Paper Group ANR 498

Local Shape Spectrum Analysis for 3D Facial Expression Recognition. Towards life cycle identification of malaria parasites using machine learning and Riemannian geometry. Dualing GANs. Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem. Improving Network Robustness against Adversarial Attacks with Compact Convolution. Block mod …

Local Shape Spectrum Analysis for 3D Facial Expression Recognition


Title	Local Shape Spectrum Analysis for 3D Facial Expression Recognition
Authors	Dmytro Derkach, Federico M. Sukno
Abstract	We investigate the problem of facial expression recognition using 3D data. Building from one of the most successful frameworks for facial analysis using exclusively 3D geometry, we extend the analysis from a curve-based representation into a spectral representation, which allows a complete description of the underlying surface that can be further tuned to the desired level of detail. Spectral representations are based on the decomposition of the geometry in its spatial frequency components, much like a Fourier transform, which are related to intrinsic characteristics of the surface. In this work, we propose the use of Graph Laplacian Features (GLF), which results from the projection of local surface patches into a common basis obtained from the Graph Laplacian eigenspace. We test the proposed approach in the BU-3DFE database in terms of expressions and Action Units recognition. Our results confirm that the proposed GLF produces consistently higher recognition rates than the curves-based approach, thanks to a more complete description of the surface, while requiring a lower computational complexity. We also show that the GLF outperform the most popular alternative approach for spectral representation, Shape- DNA, which is based on the Laplace Beltrami Operator and cannot provide a stable basis that guarantee that the extracted signatures for the different patches are directly comparable.
Tasks	3D Facial Expression Recognition, Facial Expression Recognition
Published	2017-05-19
URL	http://arxiv.org/abs/1705.06900v1
PDF	http://arxiv.org/pdf/1705.06900v1.pdf
PWC	https://paperswithcode.com/paper/local-shape-spectrum-analysis-for-3d-facial
Repo
Framework

Towards life cycle identification of malaria parasites using machine learning and Riemannian geometry


Title	Towards life cycle identification of malaria parasites using machine learning and Riemannian geometry
Authors	Arash Mehrjou
Abstract	Malaria is a serious infectious disease that is responsible for over half million deaths yearly worldwide. The major cause of these mortalities is late or inaccurate diagnosis. Manual microscopy is currently considered as the dominant diagnostic method for malaria. However, it is time consuming and prone to human errors. The aim of this paper is to automate the diagnosis process and minimize the human intervention. We have developed the hardware and software for a cost-efficient malaria diagnostic system. This paper describes the manufactured hardware and also proposes novel software to handle parasite detection and life-stage identification. A motorized microscope is developed to take images from Giemsa-stained blood smears. A patch-based unsupervised statistical clustering algorithm is proposed which offers a novel method for classification of different regions within blood images. The proposed method provides better robustness against different imaging settings. The core of the proposed algorithm is a model called Mixture of Independent Component Analysis. A manifold based optimization method is proposed that facilitates the application of the model for high dimensional data usually acquired in medical microscopy. The method was tested on 600 blood slides with various imaging conditions. The speed of the method is higher than current supervised systems while its accuracy is comparable to or better than them.
Tasks
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05200v1
PDF	http://arxiv.org/pdf/1708.05200v1.pdf
PWC	https://paperswithcode.com/paper/towards-life-cycle-identification-of-malaria
Repo
Framework

Dualing GANs


Title	Dualing GANs
Authors	Yujia Li, Alexander Schwing, Kuan-Chieh Wang, Richard Zemel
Abstract	Generative adversarial nets (GANs) are a promising technique for modeling a distribution from samples. It is however well known that GAN training suffers from instability due to the nature of its maximin formulation. In this paper, we explore ways to tackle the instability problem by dualizing the discriminator. We start from linear discriminators in which case conjugate duality provides a mechanism to reformulate the saddle point objective into a maximization problem, such that both the generator and the discriminator of this ‘dualing GAN’ act in concert. We then demonstrate how to extend this intuition to non-linear formulations. For GANs with linear discriminators our approach is able to remove the instability in training, while for GANs with nonlinear discriminators our approach provides an alternative to the commonly used GAN training algorithm.
Tasks
Published	2017-06-19
URL	http://arxiv.org/abs/1706.06216v1
PDF	http://arxiv.org/pdf/1706.06216v1.pdf
PWC	https://paperswithcode.com/paper/dualing-gans
Repo
Framework

Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem


Title	Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem
Authors	Justin Sirignano, Konstantinos Spiliopoulos
Abstract	Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation. This paper analyzes the asymptotic convergence rate of the SGDCT algorithm by proving a central limit theorem (CLT) for strongly convex objective functions and, under slightly stronger conditions, for non-convex objective functions as well. An $L^{p}$ convergence rate is also proven for the algorithm in the strongly convex case. The mathematical analysis lies at the intersection of stochastic analysis and statistical learning.
Tasks
Published	2017-10-11
URL	https://arxiv.org/abs/1710.04273v4
PDF	https://arxiv.org/pdf/1710.04273v4.pdf
PWC	https://paperswithcode.com/paper/stochastic-gradient-descent-in-continuous-1
Repo
Framework

Improving Network Robustness against Adversarial Attacks with Compact Convolution


Title	Improving Network Robustness against Adversarial Attacks with Compact Convolution
Authors	Rajeev Ranjan, Swami Sankaranarayanan, Carlos D. Castillo, Rama Chellappa
Abstract	Though Convolutional Neural Networks (CNNs) have surpassed human-level performance on tasks such as object classification and face verification, they can easily be fooled by adversarial attacks. These attacks add a small perturbation to the input image that causes the network to misclassify the sample. In this paper, we focus on neutralizing adversarial attacks by compact feature learning. In particular, we show that learning features in a closed and bounded space improves the robustness of the network. We explore the effect of L2-Softmax Loss, that enforces compactness in the learned features, thus resulting in enhanced robustness to adversarial perturbations. Additionally, we propose compact convolution, a novel method of convolution that when incorporated in conventional CNNs improves their robustness. Compact convolution ensures feature compactness at every layer such that they are bounded and close to each other. Extensive experiments show that Compact Convolutional Networks (CCNs) neutralize multiple types of attacks, and perform better than existing methods in defending adversarial attacks, without incurring any additional training overhead compared to CNNs.
Tasks	Face Verification, Object Classification
Published	2017-12-03
URL	http://arxiv.org/abs/1712.00699v2
PDF	http://arxiv.org/pdf/1712.00699v2.pdf
PWC	https://paperswithcode.com/paper/improving-network-robustness-against
Repo
Framework

Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL


Title	Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL
Authors	Marco Corneli, Pierre Latouche, Fabrice Rossi
Abstract	We develop a model in which interactions between nodes of a dynamic network are counted by non homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then a maximum-likelihood estimator is developed to estimate non parametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology.
Tasks
Published	2017-07-10
URL	http://arxiv.org/abs/1707.02780v1
PDF	http://arxiv.org/pdf/1707.02780v1.pdf
PWC	https://paperswithcode.com/paper/block-modelling-in-dynamic-networks-with-non
Repo
Framework


Title	Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis
Authors	Sarfaraz Hussein, Pujan Kandel, Juan E. Corral, Candice W. Bolan, Michael B. Wallace, Ulas Bagci
Abstract	Pancreatic cancer has the poorest prognosis among all cancer types. Intraductal Papillary Mucinous Neoplasms (IPMNs) are radiographically identifiable precursors to pancreatic cancer; hence, early detection and precise risk assessment of IPMN are vital. In this work, we propose a Convolutional Neural Network (CNN) based computer aided diagnosis (CAD) system to perform IPMN diagnosis and risk assessment by utilizing multi-modal MRI. In our proposed approach, we use minimum and maximum intensity projections to ease the annotation variations among different slices and type of MRIs. Then, we present a CNN to obtain deep feature representation corresponding to each MRI modality (T1-weighted and T2-weighted). At the final step, we employ canonical correlation analysis (CCA) to perform a fusion operation at the feature level, leading to discriminative canonical correlation features. Extracted features are used for classification. Our results indicate significant improvements over other potential approaches to solve this important problem. The proposed approach doesn’t require explicit sample balancing in cases of imbalance between positive and negative examples. To the best of our knowledge, our study is the first to automatically diagnose IPMN using multi-modal MRI.
Tasks
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09779v3
PDF	http://arxiv.org/pdf/1710.09779v3.pdf
PWC	https://paperswithcode.com/paper/deep-multi-modal-classification-of
Repo
Framework

Diameter-Based Active Learning


Title	Diameter-Based Active Learning
Authors	Christopher Tosh, Sanjoy Dasgupta
Abstract	To date, the tightest upper and lower-bounds for the active learning of general concept classes have been in terms of a parameter of the learning problem called the splitting index. We provide, for the first time, an efficient algorithm that is able to realize this upper bound, and we empirically demonstrate its good performance.
Tasks	Active Learning
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08553v2
PDF	http://arxiv.org/pdf/1702.08553v2.pdf
PWC	https://paperswithcode.com/paper/diameter-based-active-learning
Repo
Framework

A Compromise Principle in Deep Monocular Depth Estimation


Title	A Compromise Principle in Deep Monocular Depth Estimation
Authors	Huan Fu, Mingming Gong, Chaohui Wang, Dacheng Tao
Abstract	Monocular depth estimation, which plays a key role in understanding 3D scene geometry, is fundamentally an ill-posed problem. Existing methods based on deep convolutional neural networks (DCNNs) have examined this problem by learning convolutional networks to estimate continuous depth maps from monocular images. However, we find that training a network to predict a high spatial resolution continuous depth map often suffers from poor local solutions. In this paper, we hypothesize that achieving a compromise between spatial and depth resolutions can improve network training. Based on this “compromise principle”, we propose a regression-classification cascaded network (RCCN), which consists of a regression branch predicting a low spatial resolution continuous depth map and a classification branch predicting a high spatial resolution discrete depth map. The two branches form a cascaded structure allowing the classification and regression branches to benefit from each other. By leveraging large-scale raw training datasets and some data augmentation strategies, our network achieves top or state-of-the-art results on the NYU Depth V2, KITTI, and Make3D benchmarks.
Tasks	Data Augmentation, Depth Estimation, Monocular Depth Estimation
Published	2017-08-28
URL	http://arxiv.org/abs/1708.08267v2
PDF	http://arxiv.org/pdf/1708.08267v2.pdf
PWC	https://paperswithcode.com/paper/a-compromise-principle-in-deep-monocular
Repo
Framework

A Labeling-Free Approach to Supervising Deep Neural Networks for Retinal Blood Vessel Segmentation


Title	A Labeling-Free Approach to Supervising Deep Neural Networks for Retinal Blood Vessel Segmentation
Authors	Yongliang Chen
Abstract	Segmenting blood vessels in fundus imaging plays an important role in medical diagnosis. Many algorithms have been proposed. While deep Neural Networks have been attracting enormous attention from computer vision community recent years and several novel works have been done in terms of its application in retinal blood vessel segmentation, most of them are based on supervised learning which requires amount of labeled data, which is both scarce and expensive to obtain. We leverage the power of Deep Convolutional Neural Networks (DCNN) in feature learning, in this work, to achieve this ultimate goal. The highly efficient feature learning of DCNN inspires our novel approach that trains the networks with automatically-generated samples to achieve desirable performance on real-world fundus images. For this, we design a set of rules abstracted from the domain-specific prior knowledge to generate these samples. We argue that, with the high efficiency of DCNN in feature learning, one can achieve this goal by constructing the training dataset with prior knowledge, no manual labeling is needed. This approach allows us to take advantages of supervised learning without labeling. We also build a naive DCNN model to test it. The results on standard benchmarks of fundus imaging show it is competitive to the state-of-the-art methods which implies a potential way to leverage the power of DCNN in feature learning.
Tasks	Medical Diagnosis
Published	2017-04-25
URL	http://arxiv.org/abs/1704.07502v2
PDF	http://arxiv.org/pdf/1704.07502v2.pdf
PWC	https://paperswithcode.com/paper/a-labeling-free-approach-to-supervising-deep
Repo
Framework

Multichannel End-to-end Speech Recognition


Title	Multichannel End-to-end Speech Recognition
Authors	Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
Abstract	The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Enhancement, Speech Recognition
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04783v1
PDF	http://arxiv.org/pdf/1703.04783v1.pdf
PWC	https://paperswithcode.com/paper/multichannel-end-to-end-speech-recognition
Repo
Framework

Optimal Experimental Design of Field Trials using Differential Evolution


Title	Optimal Experimental Design of Field Trials using Differential Evolution
Authors	Vitaliy Feoktistov, Stephane Pietravalle, Nicolas Heslot
Abstract	When setting up field experiments, to test and compare a range of genotypes (e.g. maize hybrids), it is important to account for any possible field effect that may otherwise bias performance estimates of genotypes. To do so, we propose a model-based method aimed at optimizing the allocation of the tested genotypes and checks between fields and placement within field, according to their kinship. This task can be formulated as a combinatorial permutation-based problem. We used Differential Evolution concept to solve this problem. We then present results of optimal strategies for between-field and within-field placements of genotypes and compare them to existing optimization strategies, both in terms of convergence time and result quality. The new algorithm gives promising results in terms of convergence and search space exploration.
Tasks
Published	2017-02-01
URL	https://arxiv.org/abs/1702.00815v2
PDF	https://arxiv.org/pdf/1702.00815v2.pdf
PWC	https://paperswithcode.com/paper/optimal-experimental-design-of-field-trials
Repo
Framework

How to Train Triplet Networks with 100K Identities?


Title	How to Train Triplet Networks with 100K Identities?
Authors	Chong Wang, Xue Zhang, Xipeng Lan
Abstract	Training triplet networks with large-scale data is challenging in face recognition. Due to the number of possible triplets explodes with the number of samples, previous studies adopt the online hard negative mining(OHNM) to handle it. However, as the number of identities becomes extremely large, the training will suffer from bad local minima because effective hard triplets are difficult to be found. To solve the problem, in this paper, we propose training triplet networks with subspace learning, which splits the space of all identities into subspaces consisting of only similar identities. Combined with the batch OHNM, hard triplets can be found much easier. Experiments on the large-scale MS-Celeb-1M challenge with 100K identities demonstrate that the proposed method can largely improve the performance. In addition, to deal with heavy noise and large-scale retrieval, we also make some efforts on robust noise removing and efficient image retrieval, which are used jointly with the subspace learning to obtain the state-of-the-art performance on the MS-Celeb-1M competition (without external data in Challenge1).
Tasks	Face Recognition, Image Retrieval
Published	2017-09-09
URL	http://arxiv.org/abs/1709.02940v1
PDF	http://arxiv.org/pdf/1709.02940v1.pdf
PWC	https://paperswithcode.com/paper/how-to-train-triplet-networks-with-100k
Repo
Framework

Deep Scene Text Detection with Connected Component Proposals


Title	Deep Scene Text Detection with Connected Component Proposals
Authors	Fan Jiang, Zhihui Hao, Xinran Liu
Abstract	A growing demand for natural-scene text detection has been witnessed by the computer vision community since text information plays a significant role in scene understanding and image indexing. Deep neural networks are being used due to their strong capabilities of pixel-wise classification or word localization, similar to being used in common vision problems. In this paper, we present a novel two-task network with integrating bottom and top cues. The first task aims to predict a pixel-by-pixel labeling and based on which, word proposals are generated with a canonical connected component analysis. The second task aims to output a bundle of character candidates used later to verify the word proposals. The two sub-networks share base convolutional features and moreover, we present a new loss to strengthen the interaction between them. We evaluate the proposed network on public benchmark datasets and show it can detect arbitrary-orientation scene text with a finer output boundary. In ICDAR 2013 text localization task, we achieve the state-of-the-art performance with an F-score of 0.919 and a much better recall of 0.915.
Tasks	Scene Text Detection, Scene Understanding
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05133v1
PDF	http://arxiv.org/pdf/1708.05133v1.pdf
PWC	https://paperswithcode.com/paper/deep-scene-text-detection-with-connected
Repo
Framework

DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization


Title	DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization
Authors	Amir Ali Ahmadi, Anirudha Majumdar
Abstract	In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization. Furthermore, we show with numerical experiments from diverse application areas—polynomial optimization, statistics and machine learning, derivative pricing, and control theory—that with reasonable tradeoffs in accuracy, we can handle problems at scales that are currently significantly beyond the reach of traditional sum of squares approaches. Finally, we provide a review of recent techniques that bridge the gap between our DSOS/SDSOS approach and the SOS approach at the expense of additional running time. The Supplementary Material of the paper introduces an accompanying MATLAB package for DSOS and SDSOS optimization.
Tasks
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02586v3
PDF	http://arxiv.org/pdf/1706.02586v3.pdf
PWC	https://paperswithcode.com/paper/dsos-and-sdsos-optimization-more-tractable
Repo
Framework