Paper Group ANR 498
Local Shape Spectrum Analysis for 3D Facial Expression Recognition. Towards life cycle identification of malaria parasites using machine learning and Riemannian geometry. Dualing GANs. Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem. Improving Network Robustness against Adversarial Attacks with Compact Convolution. Block mod …
Local Shape Spectrum Analysis for 3D Facial Expression Recognition
Title | Local Shape Spectrum Analysis for 3D Facial Expression Recognition |
Authors | Dmytro Derkach, Federico M. Sukno |
Abstract | We investigate the problem of facial expression recognition using 3D data. Building from one of the most successful frameworks for facial analysis using exclusively 3D geometry, we extend the analysis from a curve-based representation into a spectral representation, which allows a complete description of the underlying surface that can be further tuned to the desired level of detail. Spectral representations are based on the decomposition of the geometry in its spatial frequency components, much like a Fourier transform, which are related to intrinsic characteristics of the surface. In this work, we propose the use of Graph Laplacian Features (GLF), which results from the projection of local surface patches into a common basis obtained from the Graph Laplacian eigenspace. We test the proposed approach in the BU-3DFE database in terms of expressions and Action Units recognition. Our results confirm that the proposed GLF produces consistently higher recognition rates than the curves-based approach, thanks to a more complete description of the surface, while requiring a lower computational complexity. We also show that the GLF outperform the most popular alternative approach for spectral representation, Shape- DNA, which is based on the Laplace Beltrami Operator and cannot provide a stable basis that guarantee that the extracted signatures for the different patches are directly comparable. |
Tasks | 3D Facial Expression Recognition, Facial Expression Recognition |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.06900v1 |
http://arxiv.org/pdf/1705.06900v1.pdf | |
PWC | https://paperswithcode.com/paper/local-shape-spectrum-analysis-for-3d-facial |
Repo | |
Framework | |
Towards life cycle identification of malaria parasites using machine learning and Riemannian geometry
Title | Towards life cycle identification of malaria parasites using machine learning and Riemannian geometry |
Authors | Arash Mehrjou |
Abstract | Malaria is a serious infectious disease that is responsible for over half million deaths yearly worldwide. The major cause of these mortalities is late or inaccurate diagnosis. Manual microscopy is currently considered as the dominant diagnostic method for malaria. However, it is time consuming and prone to human errors. The aim of this paper is to automate the diagnosis process and minimize the human intervention. We have developed the hardware and software for a cost-efficient malaria diagnostic system. This paper describes the manufactured hardware and also proposes novel software to handle parasite detection and life-stage identification. A motorized microscope is developed to take images from Giemsa-stained blood smears. A patch-based unsupervised statistical clustering algorithm is proposed which offers a novel method for classification of different regions within blood images. The proposed method provides better robustness against different imaging settings. The core of the proposed algorithm is a model called Mixture of Independent Component Analysis. A manifold based optimization method is proposed that facilitates the application of the model for high dimensional data usually acquired in medical microscopy. The method was tested on 600 blood slides with various imaging conditions. The speed of the method is higher than current supervised systems while its accuracy is comparable to or better than them. |
Tasks | |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05200v1 |
http://arxiv.org/pdf/1708.05200v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-life-cycle-identification-of-malaria |
Repo | |
Framework | |
Dualing GANs
Title | Dualing GANs |
Authors | Yujia Li, Alexander Schwing, Kuan-Chieh Wang, Richard Zemel |
Abstract | Generative adversarial nets (GANs) are a promising technique for modeling a distribution from samples. It is however well known that GAN training suffers from instability due to the nature of its maximin formulation. In this paper, we explore ways to tackle the instability problem by dualizing the discriminator. We start from linear discriminators in which case conjugate duality provides a mechanism to reformulate the saddle point objective into a maximization problem, such that both the generator and the discriminator of this ‘dualing GAN’ act in concert. We then demonstrate how to extend this intuition to non-linear formulations. For GANs with linear discriminators our approach is able to remove the instability in training, while for GANs with nonlinear discriminators our approach provides an alternative to the commonly used GAN training algorithm. |
Tasks | |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.06216v1 |
http://arxiv.org/pdf/1706.06216v1.pdf | |
PWC | https://paperswithcode.com/paper/dualing-gans |
Repo | |
Framework | |
Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem
Title | Stochastic Gradient Descent in Continuous Time: A Central Limit Theorem |
Authors | Justin Sirignano, Konstantinos Spiliopoulos |
Abstract | Stochastic gradient descent in continuous time (SGDCT) provides a computationally efficient method for the statistical learning of continuous-time models, which are widely used in science, engineering, and finance. The SGDCT algorithm follows a (noisy) descent direction along a continuous stream of data. The parameter updates occur in continuous time and satisfy a stochastic differential equation. This paper analyzes the asymptotic convergence rate of the SGDCT algorithm by proving a central limit theorem (CLT) for strongly convex objective functions and, under slightly stronger conditions, for non-convex objective functions as well. An $L^{p}$ convergence rate is also proven for the algorithm in the strongly convex case. The mathematical analysis lies at the intersection of stochastic analysis and statistical learning. |
Tasks | |
Published | 2017-10-11 |
URL | https://arxiv.org/abs/1710.04273v4 |
https://arxiv.org/pdf/1710.04273v4.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-gradient-descent-in-continuous-1 |
Repo | |
Framework | |
Improving Network Robustness against Adversarial Attacks with Compact Convolution
Title | Improving Network Robustness against Adversarial Attacks with Compact Convolution |
Authors | Rajeev Ranjan, Swami Sankaranarayanan, Carlos D. Castillo, Rama Chellappa |
Abstract | Though Convolutional Neural Networks (CNNs) have surpassed human-level performance on tasks such as object classification and face verification, they can easily be fooled by adversarial attacks. These attacks add a small perturbation to the input image that causes the network to misclassify the sample. In this paper, we focus on neutralizing adversarial attacks by compact feature learning. In particular, we show that learning features in a closed and bounded space improves the robustness of the network. We explore the effect of L2-Softmax Loss, that enforces compactness in the learned features, thus resulting in enhanced robustness to adversarial perturbations. Additionally, we propose compact convolution, a novel method of convolution that when incorporated in conventional CNNs improves their robustness. Compact convolution ensures feature compactness at every layer such that they are bounded and close to each other. Extensive experiments show that Compact Convolutional Networks (CCNs) neutralize multiple types of attacks, and perform better than existing methods in defending adversarial attacks, without incurring any additional training overhead compared to CNNs. |
Tasks | Face Verification, Object Classification |
Published | 2017-12-03 |
URL | http://arxiv.org/abs/1712.00699v2 |
http://arxiv.org/pdf/1712.00699v2.pdf | |
PWC | https://paperswithcode.com/paper/improving-network-robustness-against |
Repo | |
Framework | |
Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL
Title | Block modelling in dynamic networks with non-homogeneous Poisson processes and exact ICL |
Authors | Marco Corneli, Pierre Latouche, Fabrice Rossi |
Abstract | We develop a model in which interactions between nodes of a dynamic network are counted by non homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then a maximum-likelihood estimator is developed to estimate non parametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology. |
Tasks | |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02780v1 |
http://arxiv.org/pdf/1707.02780v1.pdf | |
PWC | https://paperswithcode.com/paper/block-modelling-in-dynamic-networks-with-non |
Repo | |
Framework | |
Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis
Title | Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis |
Authors | Sarfaraz Hussein, Pujan Kandel, Juan E. Corral, Candice W. Bolan, Michael B. Wallace, Ulas Bagci |
Abstract | Pancreatic cancer has the poorest prognosis among all cancer types. Intraductal Papillary Mucinous Neoplasms (IPMNs) are radiographically identifiable precursors to pancreatic cancer; hence, early detection and precise risk assessment of IPMN are vital. In this work, we propose a Convolutional Neural Network (CNN) based computer aided diagnosis (CAD) system to perform IPMN diagnosis and risk assessment by utilizing multi-modal MRI. In our proposed approach, we use minimum and maximum intensity projections to ease the annotation variations among different slices and type of MRIs. Then, we present a CNN to obtain deep feature representation corresponding to each MRI modality (T1-weighted and T2-weighted). At the final step, we employ canonical correlation analysis (CCA) to perform a fusion operation at the feature level, leading to discriminative canonical correlation features. Extracted features are used for classification. Our results indicate significant improvements over other potential approaches to solve this important problem. The proposed approach doesn’t require explicit sample balancing in cases of imbalance between positive and negative examples. To the best of our knowledge, our study is the first to automatically diagnose IPMN using multi-modal MRI. |
Tasks | |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09779v3 |
http://arxiv.org/pdf/1710.09779v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-multi-modal-classification-of |
Repo | |
Framework | |
Diameter-Based Active Learning
Title | Diameter-Based Active Learning |
Authors | Christopher Tosh, Sanjoy Dasgupta |
Abstract | To date, the tightest upper and lower-bounds for the active learning of general concept classes have been in terms of a parameter of the learning problem called the splitting index. We provide, for the first time, an efficient algorithm that is able to realize this upper bound, and we empirically demonstrate its good performance. |
Tasks | Active Learning |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08553v2 |
http://arxiv.org/pdf/1702.08553v2.pdf | |
PWC | https://paperswithcode.com/paper/diameter-based-active-learning |
Repo | |
Framework | |
A Compromise Principle in Deep Monocular Depth Estimation
Title | A Compromise Principle in Deep Monocular Depth Estimation |
Authors | Huan Fu, Mingming Gong, Chaohui Wang, Dacheng Tao |
Abstract | Monocular depth estimation, which plays a key role in understanding 3D scene geometry, is fundamentally an ill-posed problem. Existing methods based on deep convolutional neural networks (DCNNs) have examined this problem by learning convolutional networks to estimate continuous depth maps from monocular images. However, we find that training a network to predict a high spatial resolution continuous depth map often suffers from poor local solutions. In this paper, we hypothesize that achieving a compromise between spatial and depth resolutions can improve network training. Based on this “compromise principle”, we propose a regression-classification cascaded network (RCCN), which consists of a regression branch predicting a low spatial resolution continuous depth map and a classification branch predicting a high spatial resolution discrete depth map. The two branches form a cascaded structure allowing the classification and regression branches to benefit from each other. By leveraging large-scale raw training datasets and some data augmentation strategies, our network achieves top or state-of-the-art results on the NYU Depth V2, KITTI, and Make3D benchmarks. |
Tasks | Data Augmentation, Depth Estimation, Monocular Depth Estimation |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.08267v2 |
http://arxiv.org/pdf/1708.08267v2.pdf | |
PWC | https://paperswithcode.com/paper/a-compromise-principle-in-deep-monocular |
Repo | |
Framework | |
A Labeling-Free Approach to Supervising Deep Neural Networks for Retinal Blood Vessel Segmentation
Title | A Labeling-Free Approach to Supervising Deep Neural Networks for Retinal Blood Vessel Segmentation |
Authors | Yongliang Chen |
Abstract | Segmenting blood vessels in fundus imaging plays an important role in medical diagnosis. Many algorithms have been proposed. While deep Neural Networks have been attracting enormous attention from computer vision community recent years and several novel works have been done in terms of its application in retinal blood vessel segmentation, most of them are based on supervised learning which requires amount of labeled data, which is both scarce and expensive to obtain. We leverage the power of Deep Convolutional Neural Networks (DCNN) in feature learning, in this work, to achieve this ultimate goal. The highly efficient feature learning of DCNN inspires our novel approach that trains the networks with automatically-generated samples to achieve desirable performance on real-world fundus images. For this, we design a set of rules abstracted from the domain-specific prior knowledge to generate these samples. We argue that, with the high efficiency of DCNN in feature learning, one can achieve this goal by constructing the training dataset with prior knowledge, no manual labeling is needed. This approach allows us to take advantages of supervised learning without labeling. We also build a naive DCNN model to test it. The results on standard benchmarks of fundus imaging show it is competitive to the state-of-the-art methods which implies a potential way to leverage the power of DCNN in feature learning. |
Tasks | Medical Diagnosis |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07502v2 |
http://arxiv.org/pdf/1704.07502v2.pdf | |
PWC | https://paperswithcode.com/paper/a-labeling-free-approach-to-supervising-deep |
Repo | |
Framework | |
Multichannel End-to-end Speech Recognition
Title | Multichannel End-to-end Speech Recognition |
Authors | Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey |
Abstract | The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer. |
Tasks | End-To-End Speech Recognition, Language Modelling, Speech Enhancement, Speech Recognition |
Published | 2017-03-14 |
URL | http://arxiv.org/abs/1703.04783v1 |
http://arxiv.org/pdf/1703.04783v1.pdf | |
PWC | https://paperswithcode.com/paper/multichannel-end-to-end-speech-recognition |
Repo | |
Framework | |
Optimal Experimental Design of Field Trials using Differential Evolution
Title | Optimal Experimental Design of Field Trials using Differential Evolution |
Authors | Vitaliy Feoktistov, Stephane Pietravalle, Nicolas Heslot |
Abstract | When setting up field experiments, to test and compare a range of genotypes (e.g. maize hybrids), it is important to account for any possible field effect that may otherwise bias performance estimates of genotypes. To do so, we propose a model-based method aimed at optimizing the allocation of the tested genotypes and checks between fields and placement within field, according to their kinship. This task can be formulated as a combinatorial permutation-based problem. We used Differential Evolution concept to solve this problem. We then present results of optimal strategies for between-field and within-field placements of genotypes and compare them to existing optimization strategies, both in terms of convergence time and result quality. The new algorithm gives promising results in terms of convergence and search space exploration. |
Tasks | |
Published | 2017-02-01 |
URL | https://arxiv.org/abs/1702.00815v2 |
https://arxiv.org/pdf/1702.00815v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-experimental-design-of-field-trials |
Repo | |
Framework | |
How to Train Triplet Networks with 100K Identities?
Title | How to Train Triplet Networks with 100K Identities? |
Authors | Chong Wang, Xue Zhang, Xipeng Lan |
Abstract | Training triplet networks with large-scale data is challenging in face recognition. Due to the number of possible triplets explodes with the number of samples, previous studies adopt the online hard negative mining(OHNM) to handle it. However, as the number of identities becomes extremely large, the training will suffer from bad local minima because effective hard triplets are difficult to be found. To solve the problem, in this paper, we propose training triplet networks with subspace learning, which splits the space of all identities into subspaces consisting of only similar identities. Combined with the batch OHNM, hard triplets can be found much easier. Experiments on the large-scale MS-Celeb-1M challenge with 100K identities demonstrate that the proposed method can largely improve the performance. In addition, to deal with heavy noise and large-scale retrieval, we also make some efforts on robust noise removing and efficient image retrieval, which are used jointly with the subspace learning to obtain the state-of-the-art performance on the MS-Celeb-1M competition (without external data in Challenge1). |
Tasks | Face Recognition, Image Retrieval |
Published | 2017-09-09 |
URL | http://arxiv.org/abs/1709.02940v1 |
http://arxiv.org/pdf/1709.02940v1.pdf | |
PWC | https://paperswithcode.com/paper/how-to-train-triplet-networks-with-100k |
Repo | |
Framework | |
Deep Scene Text Detection with Connected Component Proposals
Title | Deep Scene Text Detection with Connected Component Proposals |
Authors | Fan Jiang, Zhihui Hao, Xinran Liu |
Abstract | A growing demand for natural-scene text detection has been witnessed by the computer vision community since text information plays a significant role in scene understanding and image indexing. Deep neural networks are being used due to their strong capabilities of pixel-wise classification or word localization, similar to being used in common vision problems. In this paper, we present a novel two-task network with integrating bottom and top cues. The first task aims to predict a pixel-by-pixel labeling and based on which, word proposals are generated with a canonical connected component analysis. The second task aims to output a bundle of character candidates used later to verify the word proposals. The two sub-networks share base convolutional features and moreover, we present a new loss to strengthen the interaction between them. We evaluate the proposed network on public benchmark datasets and show it can detect arbitrary-orientation scene text with a finer output boundary. In ICDAR 2013 text localization task, we achieve the state-of-the-art performance with an F-score of 0.919 and a much better recall of 0.915. |
Tasks | Scene Text Detection, Scene Understanding |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05133v1 |
http://arxiv.org/pdf/1708.05133v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-scene-text-detection-with-connected |
Repo | |
Framework | |
DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization
Title | DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization |
Authors | Amir Ali Ahmadi, Anirudha Majumdar |
Abstract | In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization. Furthermore, we show with numerical experiments from diverse application areas—polynomial optimization, statistics and machine learning, derivative pricing, and control theory—that with reasonable tradeoffs in accuracy, we can handle problems at scales that are currently significantly beyond the reach of traditional sum of squares approaches. Finally, we provide a review of recent techniques that bridge the gap between our DSOS/SDSOS approach and the SOS approach at the expense of additional running time. The Supplementary Material of the paper introduces an accompanying MATLAB package for DSOS and SDSOS optimization. |
Tasks | |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02586v3 |
http://arxiv.org/pdf/1706.02586v3.pdf | |
PWC | https://paperswithcode.com/paper/dsos-and-sdsos-optimization-more-tractable |
Repo | |
Framework | |