Paper Group ANR 580
Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection?. Adversarially Robust Training through Structured Gradient Regularization. Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability. Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions. A multipl …
Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection?
Title | Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection? |
Authors | Adam Kortylewski, Bernhard Egger, Andreas Morel-Forster, Andreas Schneider, Thomas Gerig, Clemens Blumer, Corius Reyneke, Thomas Vetter |
Abstract | It is well known that deep learning approaches to face recognition and facial landmark detection suffer from biases in modern training datasets. In this work, we propose to use synthetic face images to reduce the negative effects of dataset biases on these tasks. Using a 3D morphable face model, we generate large amounts of synthetic face images with full control over facial shape and color, pose, illumination, and background. With a series of experiments, we extensively test the effects of priming deep nets by pre-training them with synthetic faces. We observe the following positive effects for face recognition and facial landmark detection tasks: 1) Priming with synthetic face images improves the performance consistently across all benchmarks because it reduces the negative effects of biases in the training data. 2) Traditional approaches for reducing the damage of dataset bias, such as data augmentation and transfer learning, are less effective than training with synthetic faces. 3) Using synthetic data, we can reduce the size of real-world datasets by 75% for face recognition and by 50% for facial landmark detection while maintaining performance. Thus, offering a means to focus the data collection process on less but higher quality data. |
Tasks | Data Augmentation, Face Recognition, Facial Landmark Detection, Transfer Learning |
Published | 2018-11-19 |
URL | https://arxiv.org/abs/1811.08565v2 |
https://arxiv.org/pdf/1811.08565v2.pdf | |
PWC | https://paperswithcode.com/paper/priming-deep-neural-networks-with-synthetic |
Repo | |
Framework | |
Adversarially Robust Training through Structured Gradient Regularization
Title | Adversarially Robust Training through Structured Gradient Regularization |
Authors | Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann |
Abstract | We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles, leveraging the fundamental link between training with noise and regularization. It adds very little computational overhead during learning and is simple to implement generically in standard deep learning frameworks. Our experiments provide strong evidence that structured gradient regularization can act as an effective first line of defense against attacks based on low-level signal corruption. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08736v1 |
http://arxiv.org/pdf/1805.08736v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarially-robust-training-through |
Repo | |
Framework | |
Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability
Title | Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability |
Authors | Jon Kleinberg, Sendhil Mullainathan |
Abstract | Algorithms are increasingly used to aid, or in some cases supplant, human decision-making, particularly for decisions that hinge on predictions. As a result, two additional features in addition to prediction quality have generated interest: (i) to facilitate human interaction and understanding with these algorithms, we desire prediction functions that are in some fashion simple or interpretable; and (ii) because they influence consequential decisions, we also want them to produce equitable allocations. We develop a formal model to explore the relationship between the demands of simplicity and equity. Although the two concepts appear to be motivated by qualitatively distinct goals, we show a fundamental inconsistency between them. Specifically, we formalize a general framework for producing simple prediction functions, and in this framework we establish two basic results. First, every simple prediction function is strictly improvable: there exists a more complex prediction function that is both strictly more efficient and also strictly more equitable. Put another way, using a simple prediction function both reduces utility for disadvantaged groups and reduces overall welfare relative to other options. Second, we show that simple prediction functions necessarily create incentives to use information about individuals’ membership in a disadvantaged group — incentives that weren’t present before simplification, and that work against these individuals. Thus, simplicity transforms disadvantage into bias against the disadvantaged group. Our results are not only about algorithms but about any process that produces simple models, and as such they connect to the psychology of stereotypes and to an earlier economics literature on statistical discrimination. |
Tasks | Decision Making |
Published | 2018-09-12 |
URL | https://arxiv.org/abs/1809.04578v2 |
https://arxiv.org/pdf/1809.04578v2.pdf | |
PWC | https://paperswithcode.com/paper/simplicity-creates-inequity-implications-for |
Repo | |
Framework | |
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions
Title | Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions |
Authors | Shuaiwen Wang, Wenda Zhou, Haihao Lu, Arian Maleki, Vahab Mirrokni |
Abstract | Consider the following class of learning schemes: $$\hat{\boldsymbol{\beta}} := \arg\min_{\boldsymbol{\beta}};\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbol{\beta}; y_j) + \lambda R(\boldsymbol{\beta}),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer, $\boldsymbol{\beta}$ denote the unknown weights, and $\lambda$ be a regularization parameter. Finding the optimal choice of $\lambda$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of (1). We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases. |
Tasks | |
Published | 2018-07-07 |
URL | http://arxiv.org/abs/1807.02694v1 |
http://arxiv.org/pdf/1807.02694v1.pdf | |
PWC | https://paperswithcode.com/paper/approximate-leave-one-out-for-fast-parameter |
Repo | |
Framework | |
A multiple criteria methodology for prioritizing and selecting portfolios of urban projects
Title | A multiple criteria methodology for prioritizing and selecting portfolios of urban projects |
Authors | Maria Barbati, Josè Rui Figueira, Salvatore Greco, Alessio Ishizaka, Simona Panaro |
Abstract | This paper presents an integrated methodology supporting decisions in urban planning. In particular, it deals with the prioritization and the selection of a portfolio of projects related to buildings of some values for the cultural heritage in cities. More precisely, our methodology has been validated to the historical center of Naples, Italy. Each project is assessed on the basis of a set of both quantitative and qualitative criteria with the purpose to determine their level of priority for further selection. This step was performed through the application of the Electre Tri-nC method which is a multiple criteria outranking based method for ordinal classification (or sorting) problems and allows to assign a priority level to each project as an analytical “recommendation” tool. To identify the efficient portfolios and to support the selection of the most adequate set of projects to activate, a set of resources (namely budgetary constraints) as well as some logical constraints related to urban policy requirements have to be taken into consideration together with the priority of projects in a portfolio analysis model. The process has been conducted by means of the interaction between analysts, municipality representative and experts. The proposed methodology is generic enough to be applied to other territorial or urban planning problems. We strongly believe that, given the increasing interest of historical cities to restore their cultural heritage, the integrated multiple criteria decision aiding analytical tool proposed in this paper has significant potential to be used in the future. |
Tasks | |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.10410v2 |
http://arxiv.org/pdf/1812.10410v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multiple-criteria-methodology-for |
Repo | |
Framework | |
Taming the Cross Entropy Loss
Title | Taming the Cross Entropy Loss |
Authors | Manuel Martinez, Rainer Stiefelhagen |
Abstract | We present the Tamed Cross Entropy (TCE) loss function, a robust derivative of the standard Cross Entropy (CE) loss used in deep learning for classification tasks. However, unlike other robust losses, the TCE loss is designed to exhibit the same training properties than the CE loss in noiseless scenarios. Therefore, the TCE loss requires no modification on the training regime compared to the CE loss and, in consequence, can be applied in all applications where the CE loss is currently used. We evaluate the TCE loss using the ResNet architecture on four image datasets that we artificially contaminated with various levels of label noise. The TCE loss outperforms the CE loss in every tested scenario. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05075v1 |
http://arxiv.org/pdf/1810.05075v1.pdf | |
PWC | https://paperswithcode.com/paper/taming-the-cross-entropy-loss |
Repo | |
Framework | |
Data-Driven Analysis of Pareto Set Topology
Title | Data-Driven Analysis of Pareto Set Topology |
Authors | Naoki Hamada, Keisuke Goto |
Abstract | When and why can evolutionary multi-objective optimization (EMO) algorithms cover the entire Pareto set? That is a major concern for EMO researchers and practitioners. A recent theoretical study revealed that (roughly speaking) if the Pareto set forms a topological simplex (a curved line, a curved triangle, a curved tetrahedron, etc.), then decomposition-based EMO algorithms can cover the entire Pareto set. Usually, we cannot know the true Pareto set and have to estimate its topology by using the population of EMO algorithms during or after the runtime. This paper presents a data-driven approach to analyze the topology of the Pareto set. We give a theory of how to recognize the topology of the Pareto set from data and implement an algorithm to judge whether the true Pareto set may form a topological simplex or not. Numerical experiments show that the proposed method correctly recognizes the topology of high-dimensional Pareto sets within reasonable population size. |
Tasks | |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07179v1 |
http://arxiv.org/pdf/1804.07179v1.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-analysis-of-pareto-set-topology |
Repo | |
Framework | |
Detection of distal radius fractures trained by a small set of X-ray images and Faster R-CNN
Title | Detection of distal radius fractures trained by a small set of X-ray images and Faster R-CNN |
Authors | Erez Yahalomi, Michael Chernofsky, Michael Werman |
Abstract | Distal radius fractures are the most common fractures of the upper extremity in humans. As such, they account for a significant portion of the injuries that present to emergency rooms and clinics throughout the world. We trained a Faster R-CNN, a machine vision neural network for object detection, to identify and locate distal radius fractures in anteroposterior X-ray images. We achieved an accuracy of 96% in identifying fractures and mean Average Precision, mAP, of 0.866. This is significantly more accurate than the detection achieved by physicians and radiologists. These results were obtained by training the deep learning network with only 38 original images of anteroposterior hands X-ray images with fractures. This opens the possibility to detect with this type of neural network rare diseases or rare symptoms of common diseases , where only a small set of diagnosed X-ray images could be collected for each disease. |
Tasks | Object Detection |
Published | 2018-12-21 |
URL | http://arxiv.org/abs/1812.09025v1 |
http://arxiv.org/pdf/1812.09025v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-distal-radius-fractures-trained |
Repo | |
Framework | |
Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution
Title | Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution |
Authors | Lingyang Chu, Xia Hu, Juhua Hu, Lanjun Wang, Jian Pei |
Abstract | Strong intelligent machines powered by deep neural networks are increasingly deployed as black boxes to make decisions in risk-sensitive domains, such as finance and medical. To reduce potential risk and build trust with users, it is critical to interpret how such machines make their decisions. Existing works interpret a pre-trained neural network by analyzing hidden neurons, mimicking pre-trained models or approximating local predictions. However, these methods do not provide a guarantee on the exactness and consistency of their interpretation. In this paper, we propose an elegant closed form solution named $OpenBox$ to compute exact and consistent interpretations for the family of Piecewise Linear Neural Networks (PLNN). The major idea is to first transform a PLNN into a mathematically equivalent set of linear classifiers, then interpret each linear classifier by the features that dominate its prediction. We further apply $OpenBox$ to demonstrate the effectiveness of non-negative and sparse constraints on improving the interpretability of PLNNs. The extensive experiments on both synthetic and real world data sets clearly demonstrate the exactness and consistency of our interpretation. |
Tasks | |
Published | 2018-02-17 |
URL | https://arxiv.org/abs/1802.06259v2 |
https://arxiv.org/pdf/1802.06259v2.pdf | |
PWC | https://paperswithcode.com/paper/exact-and-consistent-interpretation-for |
Repo | |
Framework | |
Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures
Title | Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures |
Authors | Robert Lim, Kenneth Heafield, Hieu Hoang, Mark Briers, Allen Malony |
Abstract | Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and the massive amount of training corpuses generated from news outlets, government agencies and social media. Training a learning classifier for neural networks entails tuning hyper-parameters that would yield the best performance. Unfortunately, the number of parameters for machine translation include discrete categories as well as continuous options, which makes for a combinatorial explosive problem. This research explores optimizing hyper-parameters when training deep learning neural networks for machine translation. Specifically, our work investigates training a language model with Marian NMT. Results compare NMT under various hyper-parameter settings across a variety of modern GPU architecture generations in single node and multi-node settings, revealing insights on which hyper-parameters matter most in terms of performance, such as words processed per second, convergence rates, and translation accuracy, and provides insights on how to best achieve high-performing NMT systems. |
Tasks | Language Modelling, Machine Translation |
Published | 2018-05-05 |
URL | http://arxiv.org/abs/1805.02094v1 |
http://arxiv.org/pdf/1805.02094v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-hyper-parameter-optimization-for |
Repo | |
Framework | |
Metropolis-Hastings view on variational inference and adversarial training
Title | Metropolis-Hastings view on variational inference and adversarial training |
Authors | Kirill Neklyudov, Evgenii Egorov, Pavel Shvechikov, Dmitry Vetrov |
Abstract | A significant part of MCMC methods can be considered as the Metropolis-Hastings (MH) algorithm with different proposal distributions. From this point of view, the problem of constructing a sampler can be reduced to the question - how to choose a proposal for the MH algorithm? To address this question, we propose to learn an independent sampler that maximizes the acceptance rate of the MH algorithm, which, as we demonstrate, is highly related to the conventional variational inference. For Bayesian inference, the proposed method compares favorably against alternatives to sample from the posterior distribution. Under the same approach, we step beyond the scope of classical MCMC methods and deduce the Generative Adversarial Networks (GANs) framework from scratch, treating the generator as the proposal and the discriminator as the acceptance test. On real-world datasets, we improve Frechet Inception Distance and Inception Score, using different GANs as a proposal distribution for the MH algorithm. In particular, we demonstrate improvements of recently proposed BigGAN model on ImageNet. |
Tasks | Bayesian Inference |
Published | 2018-10-16 |
URL | https://arxiv.org/abs/1810.07151v2 |
https://arxiv.org/pdf/1810.07151v2.pdf | |
PWC | https://paperswithcode.com/paper/metropolis-hastings-view-on-variational |
Repo | |
Framework | |
A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing
Title | A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing |
Authors | Hongyang Jia, Yinqi Tang, Hossein Valavi, Jintao Zhang, Naveen Verma |
Abstract | This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today’s computing architectures. This has motivated spatial architectures, where the arrangement of data-storage and compute hardware is distributed and explicitly aligned to the computation dataflow, most notably for matrix-vector multiplication. In-memory computing is a spatial architecture where processing elements correspond to dense bit cells, providing local storage and compute, typically employing analog operation. Though this raises the potential for high energy efficiency and throughput, analog operation has significantly limited robustness, scale, and programmability. This paper describes a 590kb in-memory-computing accelerator integrated in a programmable processor architecture, by exploiting recent approaches to charge-domain in-memory computing. The architecture takes the approach of tight coupling with an embedded CPU, through accelerator interfaces enabling integration in the standard processor memory space. Additionally, a near-memory-computing datapath both enables diverse computations locally, to address operations required across applications, and enables bit-precision scalability for matrix/input-vector elements, through a bit-parallel/bit-serial (BP/BS) scheme. Chip measurements show an energy efficiency of 152/297 1b-TOPS/W and throughput of 4.7/1.9 1b-TOPS (scaling linearly with the matrix/input-vector element precisions) at VDD of 1.2/0.85V. Neural network demonstrations with 1-b/4-b weights and activations for CIFAR-10 classification consume 5.3/105.2 $\mu$J/image at 176/23 fps, with accuracy at the level of digital/software implementation (89.3/92.4 $%$ accuracy). |
Tasks | |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.04047v1 |
http://arxiv.org/pdf/1811.04047v1.pdf | |
PWC | https://paperswithcode.com/paper/a-microprocessor-implemented-in-65nm-cmos |
Repo | |
Framework | |
Disentangling the independently controllable factors of variation by interacting with the world
Title | Disentangling the independently controllable factors of variation by interacting with the world |
Authors | Valentin Thomas, Emmanuel Bengio, William Fedus, Jules Pondard, Philippe Beaudoin, Hugo Larochelle, Joelle Pineau, Doina Precup, Yoshua Bengio |
Abstract | It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to interact with its environment. The agent can experiment with different actions and observe their effects. More specifically, we hypothesize that some of these factors correspond to aspects of the environment which are independently controllable, i.e., that there exists a policy and a learnable feature for each such aspect of the environment, such that this policy can yield changes in that feature with minimal changes to other features that explain the statistical variations in the observed data. We propose a specific objective function to find such factors, and verify experimentally that it can indeed disentangle independently controllable aspects of the environment without any extrinsic reward signal. |
Tasks | |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09484v1 |
http://arxiv.org/pdf/1802.09484v1.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-the-independently-controllable |
Repo | |
Framework | |
Machine Learning for Wireless Connectivity and Security of Cellular-Connected UAVs
Title | Machine Learning for Wireless Connectivity and Security of Cellular-Connected UAVs |
Authors | Ursula Challita, Aidin Ferdowsi, Mingzhe Chen, Walid Saad |
Abstract | Cellular-connected unmanned aerial vehicles (UAVs) will inevitably be integrated into future cellular networks as new aerial mobile users. Providing cellular connectivity to UAVs will enable a myriad of applications ranging from online video streaming to medical delivery. However, to enable a reliable wireless connectivity for the UAVs as well as a secure operation, various challenges need to be addressed such as interference management, mobility management and handover, cyber-physical attacks, and authentication. In this paper, the goal is to expose the wireless and security challenges that arise in the context of UAV-based delivery systems, UAV-based real-time multimedia streaming, and UAV-enabled intelligent transportation systems. To address such challenges, artificial neural network (ANN) based solution schemes are introduced. The introduced approaches enable the UAVs to adaptively exploit the wireless system resources while guaranteeing a secure operation, in real-time. Preliminary simulation results show the benefits of the introduced solutions for each of the aforementioned cellular-connected UAV application use case. |
Tasks | |
Published | 2018-04-15 |
URL | http://arxiv.org/abs/1804.05348v3 |
http://arxiv.org/pdf/1804.05348v3.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-for-wireless-connectivity |
Repo | |
Framework | |
Training Deep Neural Networks with Different Datasets In-the-wild: The Emotion Recognition Paradigm
Title | Training Deep Neural Networks with Different Datasets In-the-wild: The Emotion Recognition Paradigm |
Authors | Dimitrios Kollias, Stefanos Zafeiriou |
Abstract | A novel procedure is presented in this paper, for training a deep convolutional and recurrent neural network, taking into account both the available training data set and some information extracted from similar networks trained with other relevant data sets. This information is included in an extended loss function used for the network training, so that the network can have an improved performance when applied to the other data sets, without forgetting the learned knowledge from the original data set. Facial expression and emotion recognition in-the-wild is the test bed application that is used to demonstrate the improved performance achieved using the proposed approach. In this framework, we provide an experimental study on categorical emotion recognition using datasets from a very recent related emotion recognition challenge. |
Tasks | Emotion Recognition |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04359v1 |
http://arxiv.org/pdf/1809.04359v1.pdf | |
PWC | https://paperswithcode.com/paper/training-deep-neural-networks-with-different |
Repo | |
Framework | |