February 1, 2020

3675 words 18 mins read

Paper Group AWR 316

Paper Group AWR 316

Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition. Learning Generalizable Representations via Diverse Supervision. Exploring large scale public medical image datasets. Graph Construction from Data using Non Negative Kernel regression (NNK Graphs). Adversarial Learning of Disentangled a …

Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition

Title Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition
Authors Jong-Hyeon Park, Myungwoo Oh, Hyung-Min Park
Abstract In general, the performance of automatic speech recognition (ASR) systems is significantly degraded due to the mismatch between training and test environments. Recently, a deep-learning-based image-to-image translation technique to translate an image from a source domain to a desired domain was presented, and cycle-consistent adversarial network (CycleGAN) was applied to learn a mapping for speech-to-speech conversion from a speaker to a target speaker. However, this method might not be adequate to remove corrupting noise components for robust ASR because it was designed to convert speech itself. In this paper, we propose a domain adaptation method based on generative adversarial nets (GANs) with disentangled representation learning to achieve robustness in ASR systems. In particular, two separated encoders, context and domain encoders, are introduced to learn distinct latent variables. The latent variables allow us to convert the domain of speech according to its context and domain representation. We improved word accuracies by 6.55~15.70% for the CHiME4 challenge corpus by applying a noisy-to-clean environment adaptation for robust ASR. In addition, similar to the method based on the CycleGAN, this method can be used for gender adaptation in gender-mismatched recognition.
Tasks Domain Adaptation, Image-to-Image Translation, Representation Learning, Robust Speech Recognition, Speech Recognition
Published 2019-04-12
URL http://arxiv.org/abs/1904.06086v1
PDF http://arxiv.org/pdf/1904.06086v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-speech-domain-adaptation-based
Repo https://github.com/vivivic/speech-domain-adaptation-DRL
Framework pytorch

Learning Generalizable Representations via Diverse Supervision

Title Learning Generalizable Representations via Diverse Supervision
Authors Ziqi Pang, Zhiyuan Hu, Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert
Abstract The problem of rare category recognition has received a lot of attention recently, with state-of-the-art methods achieving significant improvements. However, we identify two major limitations in the existing literature. First, the benchmarks are constructed by randomly splitting the categories of artificially balanced datasets into frequent (head), and rare (tail) subsets, which results in unrealistic category distributions in both of them. Second, the idea of using external sources of supervision to learn generalizable representations is largely overlooked. In this work, we attempt to address both of these shortcomings by introducing the ADE-FewShot benchmark. It stands upon the ADE dataset for scene parsing that features a realistic, long-tail distribution of categories as well as a diverse set of annotations. We turn it into a realistic few-shot classification benchmark by splitting the object categories into head and tail based on their distribution in the world. We then analyze the effect of applying various supervision sources on representation learning for rare category recognition, and observe significant improvements.
Tasks Representation Learning, Scene Parsing
Published 2019-11-29
URL https://arxiv.org/abs/1911.12911v1
PDF https://arxiv.org/pdf/1911.12911v1.pdf
PWC https://paperswithcode.com/paper/learning-generalizable-representations-via
Repo https://github.com/BinahHu/ADE-FewShot
Framework pytorch

Exploring large scale public medical image datasets

Title Exploring large scale public medical image datasets
Authors Luke Oakden-Rayner
Abstract Rationale and Objectives: Medical artificial intelligence systems are dependent on well characterised large scale datasets. Recently released public datasets have been of great interest to the field, but pose specific challenges due to the disconnect they cause between data generation and data usage, potentially limiting the utility of these datasets. Materials and Methods: We visually explore two large public datasets, to determine how accurate the provided labels are and whether other subtle problems exist. The ChestXray14 dataset contains 112,120 frontal chest films, and the MURA dataset contains 40,561 upper limb radiographs. A subset of around 700 images from both datasets was reviewed by a board-certified radiologist, and the quality of the original labels was determined. Results: The ChestXray14 labels did not accurately reflect the visual content of the images, with positive predictive values mostly between 10% and 30% lower than the values presented in the original documentation. There were other significant problems, with examples of hidden stratification and label disambiguation failure. The MURA labels were more accurate, but the original normal/abnormal labels were inaccurate for the subset of cases with degenerative joint disease, with a sensitivity of 60% and a specificity of 82%. Conclusion: Visual inspection of images is a necessary component of understanding large image datasets. We recommend that teams producing public datasets should perform this important quality control procedure and include a thorough description of their findings, along with an explanation of the data generating procedures and labelling rules, in the documentation for their datasets.
Tasks
Published 2019-07-30
URL https://arxiv.org/abs/1907.12720v1
PDF https://arxiv.org/pdf/1907.12720v1.pdf
PWC https://paperswithcode.com/paper/exploring-large-scale-public-medical-image
Repo https://github.com/pg2455/AudioAge
Framework none

Graph Construction from Data using Non Negative Kernel regression (NNK Graphs)

Title Graph Construction from Data using Non Negative Kernel regression (NNK Graphs)
Authors Sarath Shekkizhar, Antonio Ortega
Abstract Data driven graph constructions are often used in various applications, including several machine learning tasks, where the goal is to make predictions and discover patterns. However, learning an optimal graph from data is still a challenging task. Weighted $K$-nearest neighbor and $\epsilon$-neighborhood methods are among the most common graph construction methods, due to their computational simplicity but the choice of parameters such as $K$ and $\epsilon$ associated with these methods is often ad hoc and lacks a clear interpretation. We formulate graph construction as the problem of finding a sparse signal approximation in kernel space, identifying key similarities between methods in signal approximation and existing graph learning methods. We propose non-negative kernel regression~(NNK), an improved approach for graph construction with interesting geometric and theoretical properties. We show experimentally the efficiency of NNK graphs, its robustness to choice of sparsity $K$ and better performance over state of the art graph methods in semi supervised learning tasks on real world data.
Tasks graph construction
Published 2019-10-21
URL https://arxiv.org/abs/1910.09383v1
PDF https://arxiv.org/pdf/1910.09383v1.pdf
PWC https://paperswithcode.com/paper/graph-construction-from-data-using-non
Repo https://github.com/STAC-USC/NNK_graph_construction
Framework none

Adversarial Learning of Disentangled and Generalizable Representations for Visual Attributes

Title Adversarial Learning of Disentangled and Generalizable Representations for Visual Attributes
Authors James Oldfield, Yannis Panagakis, Mihalis A. Nicolaou
Abstract Recently, a multitude of methods for image-to-image translation has demonstrated impressive results on problems such as multi-domain or multi-attribute transfer. The vast majority of such works leverages the strengths of adversarial learning in tandem with deep convolutional autoencoders to achieve realistic results by well-capturing the target data distribution. Nevertheless, the most prominent representatives of this class of methods do not facilitate semantic structure in the latent space, and usually rely on domain labels for test-time transfer. This leads to rigid models that are unable to capture the variance of each domain label. In this light, we propose a novel adversarial learning method that (i) facilitates latent structure by disentangling sources of variation based on a novel cost function and (ii) encourages learning generalizable, continuous and transferable latent codes that can be utilized for tasks such as unpaired multi-domain image transfer and synthesis, without requiring labelled test data. The resulting representations can be combined in arbitrary ways to generate novel hybrid imagery, as for example generating mixtures of identities. We demonstrate the merits of the proposed method by a set of qualitative and quantitative experiments on popular databases, where our method clearly outperforms other, state-of-the-art methods. Code for reproducing our results can be found at: https://github.com/james-oldfield/adv-attribute-disentanglement
Tasks Image-to-Image Translation
Published 2019-04-09
URL http://arxiv.org/abs/1904.04772v2
PDF http://arxiv.org/pdf/1904.04772v2.pdf
PWC https://paperswithcode.com/paper/adversarial-learning-of-disentangled-and
Repo https://github.com/james-oldfield/adv-attribute-disentanglement
Framework tf

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Title NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Authors Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, Alex C. Kot
Abstract Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]
Tasks Activity Recognition, One-Shot 3D Action Recognition, Temporal Action Localization
Published 2019-05-12
URL https://arxiv.org/abs/1905.04757v2
PDF https://arxiv.org/pdf/1905.04757v2.pdf
PWC https://paperswithcode.com/paper/ntu-rgbd-120-a-large-scale-benchmark-for-3d
Repo https://github.com/shahroudy/NTURGB-D
Framework none

Computing Linear Restrictions of Neural Networks

Title Computing Linear Restrictions of Neural Networks
Authors Matthew Sotoudeh, Aditya V. Thakur
Abstract A linear restriction of a function is the same function with its domain restricted to points on a given line. This paper addresses the problem of computing a succinct representation for a linear restriction of a piecewise-linear neural network. This primitive, which we call ExactLine, allows us to exactly characterize the result of applying the network to all of the infinitely many points on a line. In particular, ExactLine computes a partitioning of the given input line segment such that the network is affine on each partition. We present an efficient algorithm for computing ExactLine for networks that use ReLU, MaxPool, batch normalization, fully-connected, convolutional, and other layers, along with several applications. First, we show how to exactly determine decision boundaries of an ACAS Xu neural network, providing significantly improved confidence in the results compared to prior work that sampled finitely many points in the input space. Next, we demonstrate how to exactly compute integrated gradients, which are commonly used for neural network attributions, allowing us to show that the prior heuristic-based methods had relative errors of 25-45% and show that a better sampling method can achieve higher accuracy with less computation. Finally, we use ExactLine to empirically falsify the core assumption behind a well-known hypothesis about adversarial examples, and in the process identify interesting properties of adversarially-trained networks.
Tasks
Published 2019-08-17
URL https://arxiv.org/abs/1908.06214v2
PDF https://arxiv.org/pdf/1908.06214v2.pdf
PWC https://paperswithcode.com/paper/computing-linear-restrictions-of-neural
Repo https://github.com/95616ARG/SyReNN
Framework pytorch

Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth

Title Mass Estimation from Images using Deep Neural Network and Sparse Ground Truth
Authors Muhammad K A Hamdan, Daine T. Rover, Matthew J. Darr, John Just
Abstract Supervised learning is the workhorse for regression and classification tasks, but the standard approach presumes ground truth for every measurement. In real world applications, limitations due to expense or general in-feasibility due to the specific application are common. In the context of agriculture applications, yield monitoring is one such example where simple-physics based measurements such as volume or force-impact have been used to quantify mass flow, which incur error due to sensor calibration. By utilizing semi-supervised deep learning with gradient aggregation and a sequence of images, in this work we can accurately estimate a physical quantity (mass) with complex data structures and sparse ground truth. Using a vision system capturing images of a sugarcane elevator and running bamboo under controlled testing as a surrogate material to harvesting sugarcane, mass is accurately predicted from images by training a DNN using only final load weights. The DNN succeeds in capturing the complex density physics of random stacking of slender rods internally as part of the mass prediction model, and surpasses older volumetric-based methods for mass prediction. Furthermore, by incorporating knowledge about the system physics through the DNN architecture and penalty terms, improvements in prediction accuracy and stability, as well as faster learning are obtained. It is shown that the classic nonlinear regression optimization can be reformulated with an aggregation term with some independence assumptions to achieve this feat. Since the number of images for any given run are too large to fit on typical GPU vRAM, an implementation is shown that compensates for the limited memory but still achieve fast training times. The same approach presented herein could be applied to other applications like yield monitoring on grain combines or other harvesters using vision or other instrumentation.
Tasks Calibration
Published 2019-08-05
URL https://arxiv.org/abs/1908.04387v3
PDF https://arxiv.org/pdf/1908.04387v3.pdf
PWC https://paperswithcode.com/paper/mass-estimation-from-images-using-deep-neural
Repo https://github.com/mhamdan91/Mass_Flow_Estimation
Framework tf

Bias Disparity in Collaborative Recommendation: Algorithmic Evaluation and Comparison

Title Bias Disparity in Collaborative Recommendation: Algorithmic Evaluation and Comparison
Authors Masoud Mansoury, Bamshad Mobasher, Robin Burke, Mykola Pechenizkiy
Abstract Research on fairness in machine learning has been recently extended to recommender systems. One of the factors that may impact fairness is bias disparity, the degree to which a group’s preferences on various item categories fail to be reflected in the recommendations they receive. In some cases biases in the original data may be amplified or reversed by the underlying recommendation algorithm. In this paper, we explore how different recommendation algorithms reflect the tradeoff between ranking quality and bias disparity. Our experiments include neighborhood-based, model-based, and trust-aware recommendation algorithms.
Tasks Recommendation Systems
Published 2019-08-02
URL https://arxiv.org/abs/1908.00831v1
PDF https://arxiv.org/pdf/1908.00831v1.pdf
PWC https://paperswithcode.com/paper/bias-disparity-in-collaborative
Repo https://github.com/masoudmansoury/yelp_core40
Framework none

The Unfairness of Popularity Bias in Recommendation

Title The Unfairness of Popularity Bias in Recommendation
Authors Himan Abdollahpouri, Masoud Mansoury, Robin Burke, Bamshad Mobasher
Abstract Recommender systems are known to suffer from the popularity bias problem: popular (i.e. frequently rated) items get a lot of exposure while less popular ones are under-represented in the recommendations. Research in this area has been mainly focusing on finding ways to tackle this issue by increasing the number of recommended long-tail items or otherwise the overall catalog coverage. In this paper, however, we look at this problem from the users’ perspective: we want to see how popularity bias causes the recommendations to deviate from what the user expects to get from the recommender system. We define three different groups of users according to their interest in popular items (Niche, Diverse and Blockbuster-focused) and show the impact of popularity bias on the users in each group. Our experimental results on a movie dataset show that in many recommendation algorithms the recommendations the users get are extremely concentrated on popular items even if a user is interested in long-tail and non-popular items showing an extreme bias disparity.
Tasks Recommendation Systems
Published 2019-07-31
URL https://arxiv.org/abs/1907.13286v3
PDF https://arxiv.org/pdf/1907.13286v3.pdf
PWC https://paperswithcode.com/paper/the-unfairness-of-popularity-bias-in
Repo https://github.com/domkowald/LFM1b-analyses
Framework none

Dynamic-Weighted Simplex Strategy for Learning Enabled Cyber Physical Systems

Title Dynamic-Weighted Simplex Strategy for Learning Enabled Cyber Physical Systems
Authors Shreyas Ramakrishna, Charles Hartsell, Matthew P Burruss, Gabor Karsai, Abhishek Dubey
Abstract Cyber Physical Systems (CPS) have increasingly started using Learning Enabled Components (LECs) for performing perception-based control tasks. The simple design approach, and their capability to continuously learn has led to their widespread use in different autonomous applications. Despite their simplicity and impressive capabilities, these models are difficult to assure, which makes their use challenging. The problem of assuring CPS with untrusted controllers has been achieved using the Simplex Architecture. This architecture integrates the system to be assured with a safe controller and provides a decision logic to switch between the decisions of these controllers. However, the key challenges in using the Simplex Architecture are: (1) designing an effective decision logic, and (2) sudden transitions between controller decisions lead to inconsistent system performance. To address these research challenges, we make three key contributions: (1) \textit{dynamic-weighted simplex strategy} – we introduce ``weighted simplex strategy” as the weighted ensemble extension of the classical Simplex Architecture. We then provide a reinforcement learning based mechanism to find dynamic ensemble weights, (2) \textit{middleware framework} – we design a framework that allows the use of the dynamic-weighted simplex strategy, and provides a resource manager to monitor the computational resources, and (3) \textit{hardware testbed} – we design a remote-controlled car testbed called DeepNNCar to test and demonstrate the aforementioned key concepts. Using the hardware, we show that the dynamic-weighted simplex strategy has 60% fewer out-of-track occurrences (soft constraint violations), while demonstrating higher optimized speed (performance) of 0.4 m/s during indoor driving than the original LEC driven system. |
Tasks Autonomous Driving, Q-Learning
Published 2019-02-06
URL https://arxiv.org/abs/1902.02432v3
PDF https://arxiv.org/pdf/1902.02432v3.pdf
PWC https://paperswithcode.com/paper/augmenting-learning-components-for-safety-in
Repo https://github.com/vu-resilient-distributed-systems/lectures-fall-2019
Framework none

Traffic Sign Detection under Challenging Conditions: A Deeper Look Into Performance Variations and Spectral Characteristics

Title Traffic Sign Detection under Challenging Conditions: A Deeper Look Into Performance Variations and Spectral Characteristics
Authors Dogancan Temel, Min-Hung Chen, Ghassan AlRegib
Abstract Traffic signs are critical for maintaining the safety and efficiency of our roads. Therefore, we need to carefully assess the capabilities and limitations of automated traffic sign detection systems. Existing traffic sign datasets are limited in terms of type and severity of challenging conditions. Metadata corresponding to these conditions are unavailable and it is not possible to investigate the effect of a single factor because of simultaneous changes in numerous conditions. To overcome the shortcomings in existing datasets, we introduced the CURE-TSD-Real dataset, which is based on simulated challenging conditions that correspond to adversaries that can occur in real-world environments and systems. We test the performance of two benchmark algorithms and show that severe conditions can result in an average performance degradation of 29% in precision and 68% in recall. We investigate the effect of challenging conditions through spectral analysis and show that challenging conditions can lead to distinct magnitude spectrum characteristics. Moreover, we show that mean magnitude spectrum of changes in video sequences under challenging conditions can be an indicator of detection performance. CURE-TSD-Real dataset is available online at https://github.com/olivesgatech/CURE-TSD.
Tasks Traffic Sign Recognition
Published 2019-08-29
URL https://arxiv.org/abs/1908.11262v1
PDF https://arxiv.org/pdf/1908.11262v1.pdf
PWC https://paperswithcode.com/paper/traffic-sign-detection-under-challenging
Repo https://github.com/olivesgatech/CURE-TSR
Framework pytorch

Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning

Title Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning
Authors Sara Elkerdawy, Hong Zhang, Nilanjan Ray
Abstract Convolutional neural networks (CNNs) have emerged as the state-of-the-art in multiple vision tasks including depth estimation. However, memory and computing power requirements remain as challenges to be tackled in these models. Monocular depth estimation has significant use in robotics and virtual reality that requires deployment on low-end devices. Training a small model from scratch results in a significant drop in accuracy and it does not benefit from pre-trained large models. Motivated by the literature of model pruning, we propose a lightweight monocular depth model obtained from a large trained model. This is achieved by removing the least important features with a novel joint end-to-end filter pruning. We propose to learn a binary mask for each filter to decide whether to drop the filter or not. These masks are trained jointly to exploit relations between filters at different layers as well as redundancy within the same layer. We show that we can achieve around 5x compression rate with small drop in accuracy on the KITTI driving dataset. We also show that masking can improve accuracy over the baseline with fewer parameters, even without enforcing compression loss.
Tasks Depth Estimation, Monocular Depth Estimation
Published 2019-05-13
URL https://arxiv.org/abs/1905.05212v1
PDF https://arxiv.org/pdf/1905.05212v1.pdf
PWC https://paperswithcode.com/paper/lightweight-monocular-depth-estimation-model
Repo https://github.com/selkerdawy/joint-pruning-monodepth
Framework tf

Typed Graph Networks

Title Typed Graph Networks
Authors Marcelo O. R. Prates, Pedro H. C. Avelar, Henrique Lemos, Marco Gori, Luis Lamb
Abstract Recently, the deep learning community has given growing attention to neural architectures engineered to learn problems in relational domains. Convolutional Neural Networks employ parameter sharing over the image domain, tying the weights of neural connections on a grid topology and thus enforcing the learning of a number of convolutional kernels. By instantiating trainable neural modules and assembling them in varied configurations (apart from grids), one can enforce parameter sharing over graphs, yielding models which can effectively be fed with relational data. In this context, vertices in a graph can be projected into a hyperdimensional real space and iteratively refined over many message-passing iterations in an end-to-end differentiable architecture. Architectures of this family have been referred to with several definitions in the literature, such as Graph Neural Networks, Message-passing Neural Networks, Relational Networks and Graph Networks. In this paper, we revisit the original Graph Neural Network model and show that it generalises many of the recent models, which in turn benefit from the insight of thinking about vertex \textbf{types}. To illustrate the generality of the original model, we present a Graph Neural Network formalisation, which partitions the vertices of a graph into a number of types. Each type represents an entity in the ontology of the problem one wants to learn. This allows - for instance - one to assign embeddings to edges, hyperedges, and any number of global attributes of the graph. As a companion to this paper we provide a Python/Tensorflow library to facilitate the development of such architectures, with which we instantiate the formalisation to reproduce a number of models proposed in the current literature.
Tasks
Published 2019-01-23
URL http://arxiv.org/abs/1901.07984v3
PDF http://arxiv.org/pdf/1901.07984v3.pdf
PWC https://paperswithcode.com/paper/typed-graph-networks
Repo https://github.com/machine-reasoning-ufrgs/graph-neural-networks
Framework tf

Manifold Denoising by Nonlinear Robust Principal Component Analysis

Title Manifold Denoising by Nonlinear Robust Principal Component Analysis
Authors He Lyu, Ningyu Sha, Shuyang Qin, Ming Yan, Yuying Xie, Rongrong Wang
Abstract This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data. Theoretical error bounds are provided when the tangent spaces of the manifold satisfy certain incoherence conditions. We also provide a near optimal choice of the tuning parameters for the proposed optimization formulation with the help of a new curvature estimation method. The efficacy of our method is demonstrated on both synthetic and real datasets.
Tasks Denoising
Published 2019-11-10
URL https://arxiv.org/abs/1911.03831v1
PDF https://arxiv.org/pdf/1911.03831v1.pdf
PWC https://paperswithcode.com/paper/manifold-denoising-by-nonlinear-robust
Repo https://github.com/rrwng/NRPCA
Framework none
comments powered by Disqus