Paper Group ANR 470
Convergence of End-to-End Training in Deep Unsupervised Contrasitive Learning. End-to-End Multi-speaker Speech Recognition with Transformer. Damage-sensitive and domain-invariant feature extraction for vehicle-vibration-based bridge health monitoring. Real-Time Optimal Guidance and Control for Interplanetary Transfers Using Deep Networks. Pre-proce …
Convergence of End-to-End Training in Deep Unsupervised Contrasitive Learning
Title | Convergence of End-to-End Training in Deep Unsupervised Contrasitive Learning |
Authors | Zixin Wen |
Abstract | Unsupervised contrastive learning has gained increasing attention in the latest research and has proven to be a powerful method for learning representations from unlabeled data. However, little theoretical analysis was known for this framework. In this paper, we study the optimization of deep unsupervised contrastive learning. We prove that, by applying end-to-end training that simultaneously updates two deep over-parameterized neural networks, one can find an approximate stationary solution for the non-convex contrastive loss. This result is inherently different from the existing over-parameterized analysis in the supervised setting because, in contrast to learning a specific target function, unsupervised contrastive learning tries to encode the unlabeled data distribution into the neural networks, which generally has no optimal solution. Our analysis provides theoretical insights into the practical success of these unsupervised pretraining methods. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06979v2 |
https://arxiv.org/pdf/2002.06979v2.pdf | |
PWC | https://paperswithcode.com/paper/convergence-of-end-to-end-training-in-deep |
Repo | |
Framework | |
End-to-End Multi-speaker Speech Recognition with Transformer
Title | End-to-End Multi-speaker Speech Recognition with Transformer |
Authors | Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe |
Abstract | Recently, fully recurrent neural network (RNN) based end-to-end models have been proven to be effective for multi-speaker speech recognition in both the single-channel and multi-channel scenarios. In this work, we explore the use of Transformer models for these tasks by focusing on two aspects. First, we replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture. Second, in order to use the Transformer in the masking network of the neural beamformer in the multi-channel case, we modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation. Besides the model architecture improvements, we also incorporate an external dereverberation preprocessing, the weighted prediction error (WPE), enabling our model to handle reverberated signals. Experiments on the spatialized wsj1-2mix corpus show that the Transformer-based models achieve 40.9% and 25.6% relative WER reduction, down to 12.1% and 6.4% WER, under the anechoic condition in single-channel and multi-channel tasks, respectively, while in the reverberant case, our methods achieve 41.5% and 13.8% relative WER reduction, down to 16.5% and 15.2% WER. |
Tasks | Speech Recognition |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03921v2 |
https://arxiv.org/pdf/2002.03921v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-multi-speaker-speech-recognition-1 |
Repo | |
Framework | |
Damage-sensitive and domain-invariant feature extraction for vehicle-vibration-based bridge health monitoring
Title | Damage-sensitive and domain-invariant feature extraction for vehicle-vibration-based bridge health monitoring |
Authors | Jingxiao Liu, Bingqing Chen, Siheng Chen, Mario Berges, Jacobo Bielak, HaeYoung Noh |
Abstract | We introduce a physics-guided signal processing approach to extract a damage-sensitive and domain-invariant (DS & DI) feature from acceleration response data of a vehicle traveling over a bridge to assess bridge health. Motivated by indirect sensing methods’ benefits, such as low-cost and low-maintenance, vehicle-vibration-based bridge health monitoring has been studied to efficiently monitor bridges in real-time. Yet applying this approach is challenging because 1) physics-based features extracted manually are generally not damage-sensitive, and 2) features from machine learning techniques are often not applicable to different bridges. Thus, we formulate a vehicle bridge interaction system model and find a physics-guided DS & DI feature, which can be extracted using the synchrosqueezed wavelet transform representing non-stationary signals as intrinsic-mode-type components. We validate the effectiveness of the proposed feature with simulated experiments. Compared to conventional time- and frequency-domain features, our feature provides the best damage quantification and localization results across different bridges in five of six experiments. |
Tasks | |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02105v1 |
https://arxiv.org/pdf/2002.02105v1.pdf | |
PWC | https://paperswithcode.com/paper/damage-sensitive-and-domain-invariant-feature |
Repo | |
Framework | |
Real-Time Optimal Guidance and Control for Interplanetary Transfers Using Deep Networks
Title | Real-Time Optimal Guidance and Control for Interplanetary Transfers Using Deep Networks |
Authors | Dario Izzo, Ekin Öztürk |
Abstract | We consider the Earth-Venus mass-optimal interplanetary transfer of a low-thrust spacecraft and show how the optimal guidance can be represented by deep networks in a large portion of the state space and to a high degree of accuracy. Imitation (supervised) learning of optimal examples is used as a network training paradigm. The resulting models are suitable for an on-board, real-time, implementation of the optimal guidance and control system of the spacecraft and are called G&CNETs. A new general methodology called Backward Generation of Optimal Examples is introduced and shown to be able to efficiently create all the optimal state action pairs necessary to train G&CNETs without solving optimal control problems. With respect to previous works, we are able to produce datasets containing a few orders of magnitude more optimal trajectories and obtain network performances compatible with real missions requirements. Several schemes able to train representations of either the optimal policy (thrust profile) or the value function (optimal mass) are proposed and tested. We find that both policy learning and value function learning successfully and accurately learn the optimal thrust and that a spacecraft employing the learned thrust is able to reach the target conditions orbit spending only 2 permil more propellant than in the corresponding mathematically optimal transfer. Moreover, the optimal propellant mass can be predicted (in case of value function learning) within an error well within 1%. All G&CNETs produced are tested during simulations of interplanetary transfers with respect to their ability to reach the target conditions optimally starting from nominal and off-nominal conditions. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.09063v1 |
https://arxiv.org/pdf/2002.09063v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-optimal-guidance-and-control-for |
Repo | |
Framework | |
Pre-processing Image using Brightening, CLAHE and RETINEX
Title | Pre-processing Image using Brightening, CLAHE and RETINEX |
Authors | Thi Phuoc Hanh Nguyen, Zinan Cai, Khanh Nguyen, Sokuntheariddh Keth, Ningyuan Shen, Mira Park |
Abstract | This paper focuses on finding the most optimal pre-processing methods considering three common algorithms for image enhancement: Brightening, CLAHE and Retinex. For the purpose of image training in general, these methods will be combined to find out the most optimal method for image enhancement. We have carried out the research on the different permutation of three methods: Brightening, CLAHE and Retinex. The evaluation is based on Canny Edge detection applied to all processed images. Then the sharpness of objects will be justified by true positive pixels number in comparison between images. After using different number combinations pre-processing functions on images, CLAHE proves to be the most effective in edges improvement, Brightening does not show much effect on the edges enhancement, and the Retinex even reduces the sharpness of images and shows little contribution on images enhancement. |
Tasks | Edge Detection, Image Enhancement |
Published | 2020-03-22 |
URL | https://arxiv.org/abs/2003.10822v1 |
https://arxiv.org/pdf/2003.10822v1.pdf | |
PWC | https://paperswithcode.com/paper/pre-processing-image-using-brightening-clahe |
Repo | |
Framework | |
No Surprises: Training Robust Lung Nodule Detection for Low-Dose CT Scans by Augmenting with Adversarial Attacks
Title | No Surprises: Training Robust Lung Nodule Detection for Low-Dose CT Scans by Augmenting with Adversarial Attacks |
Authors | Siqi Liu, Arnaud Arindra Adiyoso Setio, Florin C. Ghesu, Eli Gibson, Sasa Grbic, Bogdan Georgescu, Dorin Comaniciu |
Abstract | Detecting malignant pulmonary nodules at an early stage can allow medical interventions which increases the survival rate of lung cancer patients. Using computer vision techniques to detect nodules can improve the sensitivity and the speed of interpreting chest CT for lung cancer screening. Many studies have used CNNs to detect nodule candidates. Though such approaches have been shown to outperform the conventional image processing based methods regarding the detection accuracy, CNNs are also known to be limited to generalize on under-represented samples in the training set and prone to imperceptible noise perturbations. Such limitations can not be easily addressed by scaling up the dataset or the models. In this work, we propose to add adversarial synthetic nodules and adversarial attack samples to the training data to improve the generalization and the robustness of the lung nodule detection systems. In order to generate hard examples of nodules from a differentiable nodule synthesizer, we use projected gradient descent (PGD) to search the latent code within a bounded neighbourhood that would generate nodules to decrease the detector response. To make the network more robust to unanticipated noise perturbations, we use PGD to search for noise patterns that can trigger the network to give over-confident mistakes. By evaluating on two different benchmark datasets containing consensus annotations from three radiologists, we show that the proposed techniques can improve the detection performance on real CT data. To understand the limitations of both the conventional networks and the proposed augmented networks, we also perform stress-tests on the false positive reduction networks by feeding different types of artificially produced patches. We show that the augmented networks are more robust to both under-represented nodules as well as resistant to noise perturbations. |
Tasks | Adversarial Attack, Lung Nodule Detection |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03824v1 |
https://arxiv.org/pdf/2003.03824v1.pdf | |
PWC | https://paperswithcode.com/paper/no-surprises-training-robust-lung-nodule |
Repo | |
Framework | |
Double Backpropagation for Training Autoencoders against Adversarial Attack
Title | Double Backpropagation for Training Autoencoders against Adversarial Attack |
Authors | Chengjin Sun, Sizhe Chen, Xiaolin Huang |
Abstract | Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage and transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM. |
Tasks | Adversarial Attack |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.01895v1 |
https://arxiv.org/pdf/2003.01895v1.pdf | |
PWC | https://paperswithcode.com/paper/double-backpropagation-for-training |
Repo | |
Framework | |
Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems
Title | Real-time, Universal, and Robust Adversarial Attacks Against Speaker Recognition Systems |
Authors | Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, Bo Yuan |
Abstract | As the popularity of voice user interface (VUI) exploded in recent years, speaker recognition system has emerged as an important medium of identifying a speaker in many security-required applications and services. In this paper, we propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system. Through adding an audio-agnostic universal perturbation on arbitrary enrolled speaker’s voice input, the DNN-based speaker recognition system would identify the speaker as any target (i.e., adversary-desired) speaker label. In addition, we improve the robustness of our attack by modeling the sound distortions caused by the physical over-the-air propagation through estimating room impulse response (RIR). Experiment using a public dataset of $109$ English speakers demonstrates the effectiveness and robustness of our proposed attack with a high attack success rate of over 90%. The attack launching time also achieves a 100X speedup over contemporary non-universal attacks. |
Tasks | Adversarial Attack, Speaker Recognition |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.02301v1 |
https://arxiv.org/pdf/2003.02301v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-universal-and-robust-adversarial |
Repo | |
Framework | |
Temporal Sparse Adversarial Attack on Gait Recognition
Title | Temporal Sparse Adversarial Attack on Gait Recognition |
Authors | Ziwen He, Wei Wang, Jing Dong, Tieniu Tan |
Abstract | Gait recognition has a broad application in social security due to its advantages in long-distance human identification. Despite the high accuracy of gait recognition systems, their adversarial robustness has not been explored. In this paper, we demonstrate that the state-of-the-art gait recognition model is vulnerable to adversarial attacks. A novel temporal sparse adversarial attack under a new defined distortion measurement is proposed. GAN-based architecture is employed to semantically generate adversarial high-quality gait silhouette. By sparsely substituting or inserting a few adversarial gait silhouettes, our proposed method can achieve a high attack success rate. The imperceptibility and the attacking success rate of the adversarial examples are well balanced. Experimental results show even only one-fortieth frames are attacked, the attack success rate still reaches 76.8%. |
Tasks | Adversarial Attack, Gait Recognition |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2002.09674v1 |
https://arxiv.org/pdf/2002.09674v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-sparse-adversarial-attack-on-gait |
Repo | |
Framework | |
Designing Fair AI for Managing Employees in Organizations: A Review, Critique, and Design Agenda
Title | Designing Fair AI for Managing Employees in Organizations: A Review, Critique, and Design Agenda |
Authors | Lionel P. Robert, Casey Pierce, Liz Morris, Sangmi Kim, Rasha Alahmad |
Abstract | Organizations are rapidly deploying artificial intelligence (AI) systems to manage their workers. However, AI has been found at times to be unfair to workers. Unfairness toward workers has been associated with decreased worker effort and increased worker turnover. To avoid such problems, AI systems must be designed to support fairness and redress instances of unfairness. Despite the attention related to AI unfairness, there has not been a theoretical and systematic approach to developing a design agenda. This paper addresses the issue in three ways. First, we introduce the organizational justice theory, three different fairness types (distributive, procedural, interactional), and the frameworks for redressing instances of unfairness (retributive justice, restorative justice). Second, we review the design literature that specifically focuses on issues of AI fairness in organizations. Third, we propose a design agenda for AI fairness in organizations that applies each of the fairness types to organizational scenarios. Then, the paper concludes with implications for future research. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.09054v1 |
https://arxiv.org/pdf/2002.09054v1.pdf | |
PWC | https://paperswithcode.com/paper/designing-fair-ai-for-managing-employees-in |
Repo | |
Framework | |
An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks
Title | An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks |
Authors | Vincent Roulet, Zaid Harchaoui |
Abstract | We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on elementary arguments and computations. The convergence analysis revolves around the analytical and computational structures of optimization oracles central to the implementation of deep networks in machine learning software. We provide a systematic way to compute estimates of the smoothness constants that govern the convergence behavior of first-order optimization algorithms used to train deep networks. A diverse set of example components and architectures arising in modern deep networks intersperse the exposition to illustrate the approach. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.09051v1 |
https://arxiv.org/pdf/2002.09051v1.pdf | |
PWC | https://paperswithcode.com/paper/an-elementary-approach-to-convergence |
Repo | |
Framework | |
Entity Profiling in Knowledge Graphs
Title | Entity Profiling in Knowledge Graphs |
Authors | Xiang Zhang, Qingqing Yang, Jinru Ding, Ziyue Wang |
Abstract | Knowledge Graphs (KGs) are graph-structured knowledge bases storing factual information about real-world entities. Understanding the uniqueness of each entity is crucial to the analyzing, sharing, and reusing of KGs. Traditional profiling technologies encompass a vast array of methods to find distinctive features in various applications, which can help to differentiate entities in the process of human understanding of KGs. In this work, we present a novel profiling approach to identify distinctive entity features. The distinctiveness of features is carefully measured by a HAS model, which is a scalable representation learning model to produce a multi-pattern entity embedding. We fully evaluate the quality of entity profiles generated from real KGs. The results show that our approach facilitates human understanding of entities in KGs. |
Tasks | Knowledge Graphs, Representation Learning |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00172v1 |
https://arxiv.org/pdf/2003.00172v1.pdf | |
PWC | https://paperswithcode.com/paper/entity-profiling-in-knowledge-graphs |
Repo | |
Framework | |
Learning Representations by Predicting Bags of Visual Words
Title | Learning Representations by Predicting Bags of Visual Words |
Authors | Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, Matthieu Cord |
Abstract | Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. To build such discrete representations, we quantize the feature maps of a first pre-trained self-supervised convnet, over a k-means based vocabulary. Then, as a self-supervised task, we train another convnet to predict the histogram of visual words of an image (i.e., its Bag-of-Words representation) given as input a perturbed version of that image. The proposed task forces the convnet to learn perturbation-invariant and context-aware image features, useful for downstream image understanding tasks. We extensively evaluate our method and demonstrate very strong empirical results, e.g., our pre-trained self-supervised representations transfer better on detection task and similarly on classification over classes “unseen” during pre-training, when compared to the supervised case. This also shows that the process of image discretization into visual words can provide the basis for very powerful self-supervised approaches in the image domain, thus allowing further connections to be made to related methods from the NLP domain that have been extremely successful so far. |
Tasks | Representation Learning |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12247v1 |
https://arxiv.org/pdf/2002.12247v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-by-predicting-bags |
Repo | |
Framework | |
A Road Map to Strong Intelligence
Title | A Road Map to Strong Intelligence |
Authors | Philip Paquette |
Abstract | I wrote this paper because technology can really improve people’s lives. With it, we can live longer in a healthy body, save time through increased efficiency and automation, and make better decisions. To get to the next level, we need to start looking at intelligence from a much broader perspective, and promote international interdisciplinary collaborations. Section 1 of this paper delves into sociology and social psychology to explain that the mechanisms underlying intelligence are inherently social. Section 2 proposes a method to classify intelligence, and describes the differences between weak and strong intelligence. Section 3 examines the Chinese Room argument from a different perspective. It demonstrates that a Turing-complete machine cannot have strong intelligence, and considers the modifications necessary for a computer to be intelligent and have understanding. Section 4 argues that the existential risk caused by the technological explosion of a single agent should not be of serious concern. Section 5 looks at the AI control problem and argues that it is impossible to build a super-intelligent machine that will do what it creators want. By using insights from biology, it also proposes a solution to the control problem. Section 6 discusses some of the implications of strong intelligence. Section 7 lists the main challenges with deep learning, and asserts that radical changes will be required to reach strong intelligence. Section 8 examines a neuroscience framework that could help explain how a cortical column works. Section 9 lays out the broad strokes of a road map towards strong intelligence. Finally, section 10 analyzes the impacts and the challenges of greater intelligence. |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.09044v1 |
https://arxiv.org/pdf/2002.09044v1.pdf | |
PWC | https://paperswithcode.com/paper/a-road-map-to-strong-intelligence |
Repo | |
Framework | |
Questioning the AI: Informing Design Practices for Explainable AI User Experiences
Title | Questioning the AI: Informing Design Practices for Explainable AI User Experiences |
Authors | Q. Vera Liao, Daniel Gruen, Sarah Miller |
Abstract | A surge of interest in explainable AI (XAI) has led to a vast collection of algorithmic work on the topic. While many recognize the necessity to incorporate explainability features in AI systems, how to address real-world user needs for understanding AI remains an open question. By interviewing 20 UX and design practitioners working on various AI products, we seek to identify gaps between the current XAI algorithmic work and practices to create explainable AI products. To do so, we develop an algorithm-informed XAI question bank in which user needs for explainability are represented as prototypical questions users might ask about the AI, and use it as a study probe. Our work contributes insights into the design space of XAI, informs efforts to support design practices in this space, and identifies opportunities for future XAI work. We also provide an extended XAI question bank and discuss how it can be used for creating user-centered XAI. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02478v2 |
https://arxiv.org/pdf/2001.02478v2.pdf | |
PWC | https://paperswithcode.com/paper/questioning-the-ai-informing-design-practices |
Repo | |
Framework | |