January 25, 2020

3196 words 16 mins read

Paper Group ANR 1732

Paper Group ANR 1732

Empirical Study of Easy and Hard Examples in CNN Training. Improving benchmarks for autonomous vehicles testing using synthetically generated images. Skin Lesion Classification Using Deep Neural Network. Two-Stream Video Classification with Cross-Modality Attention. Leveraging directed causal discovery to detect latent common causes. Joint Represen …

Empirical Study of Easy and Hard Examples in CNN Training

Title Empirical Study of Easy and Hard Examples in CNN Training
Authors Ikki Kishida, Hideki Nakayama
Abstract Deep Neural Networks (DNNs) generalize well despite their massive size and capability of memorizing all examples. There is a hypothesis that DNNs start learning from simple patterns and the hypothesis is based on the existence of examples that are consistently well-classified at the early training stage (i.e., easy examples) and examples misclassified (i.e., hard examples). Easy examples are the evidence that DNNs start learning from specific patterns and there is a consistent learning process. It is important to know how DNNs learn patterns and obtain generalization ability, however, properties of easy and hard examples are not thoroughly investigated (e.g., contributions to generalization and visual appearances). In this work, we study the similarities of easy and hard examples respectively for different Convolutional Neural Network (CNN) architectures, assessing how those examples contribute to generalization. Our results show that easy examples are visually similar to each other and hard examples are visually diverse, and both examples are largely shared across different CNN architectures. Moreover, while hard examples tend to contribute more to generalization than easy examples, removing a large number of easy examples leads to poor generalization. By analyzing those results, we hypothesize that biases in a dataset and Stochastic Gradient Descent (SGD) are the reasons why CNNs have consistent easy and hard examples. Furthermore, we show that large scale classification datasets can be efficiently compressed by using easiness proposed in this work.
Tasks
Published 2019-11-25
URL https://arxiv.org/abs/1911.10739v1
PDF https://arxiv.org/pdf/1911.10739v1.pdf
PWC https://paperswithcode.com/paper/empirical-study-of-easy-and-hard-examples-in-1
Repo
Framework

Improving benchmarks for autonomous vehicles testing using synthetically generated images

Title Improving benchmarks for autonomous vehicles testing using synthetically generated images
Authors Aleksander Lukashou
Abstract Nowadays autonomous technologies are a very heavily explored area and particularly computer vision as the main component of vehicle perception. The quality of the whole vision system based on neural networks relies on the dataset it was trained on. It is extremely difficult to find traffic sign datasets from most of the counties of the world. Meaning autonomous vehicle from the USA will not be able to drive though Lithuania recognizing all road signs on the way. In this paper, we propose a solution on how to update model using a small dataset from the country vehicle will be used in. It is important to mention that is not panacea, rather small upgrade which can boost autonomous car development in countries with limited data access. We achieved about 10 percent quality raise and expect even better results during future experiments.
Tasks Autonomous Vehicles
Published 2019-04-23
URL http://arxiv.org/abs/1904.10261v1
PDF http://arxiv.org/pdf/1904.10261v1.pdf
PWC https://paperswithcode.com/paper/improving-benchmarks-for-autonomous-vehicles
Repo
Framework

Skin Lesion Classification Using Deep Neural Network

Title Skin Lesion Classification Using Deep Neural Network
Authors Alla Eddine Guissous
Abstract This paper reports the methods and techniques we have developed for classify dermoscopic images (task 1) of the ISIC 2019 challenge dataset for skin lesion classification, our approach aims to use ensemble deep neural network with some powerful techniques to deal with unbalance data sets as its the main problem for this challenge in a move to increase the performance of CNNs model.
Tasks Skin Lesion Classification
Published 2019-11-18
URL https://arxiv.org/abs/1911.07817v1
PDF https://arxiv.org/pdf/1911.07817v1.pdf
PWC https://paperswithcode.com/paper/skin-lesion-classification-using-deep-neural
Repo
Framework

Two-Stream Video Classification with Cross-Modality Attention

Title Two-Stream Video Classification with Cross-Modality Attention
Authors Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian
Abstract Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream’s prediction scores at the last stage. A valid question is whether there exists a more effective method to fuse information cross modality. With the development of attention mechanism in natural language processing, there emerge many successful applications of attention in the field of computer vision. In this paper, we propose a cross-modality attention operation, which can obtain information from other modality in a more effective way than two-stream. Correspondingly we implement a compatible block named CMA block, which is a wrapper of our proposed attention operation. CMA can be plugged into many existing architectures. In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification. All experiments clearly demonstrate strong performance superiority by our proposed method. We also analyze the advantages of the CMA block by visualizing the attention map, which intuitively shows how the block helps the final prediction.
Tasks Action Classification, Action Recognition In Videos, Video Classification
Published 2019-08-01
URL https://arxiv.org/abs/1908.00497v1
PDF https://arxiv.org/pdf/1908.00497v1.pdf
PWC https://paperswithcode.com/paper/two-stream-video-classification-with-cross
Repo
Framework

Leveraging directed causal discovery to detect latent common causes

Title Leveraging directed causal discovery to detect latent common causes
Authors Ciarán M. Lee, Christopher Hart, Jonathan G. Richens, Saurabh Johri
Abstract The discovery of causal relationships is a fundamental problem in science and medicine. In recent years, many elegant approaches to discovering causal relationships between two variables from uncontrolled data have been proposed. However, most of these deal only with purely directed causal relationships and cannot detect latent common causes. Here, we devise a general method which takes a purely directed causal discovery algorithm and modifies it so that it can also detect latent common causes. The identifiability of the modified algorithm depends on the identifiability of the original, as well as an assumption that the strength of noise be relatively small. We apply our method to two directed causal discovery algorithms, the Information Geometric Causal Inference of (Daniusis et al., 2010) and the Kernel Conditional Deviance for Causal Inference of (Mitrovic, Sejdinovic, and Teh, 2018), and extensively test on synthetic data—detecting latent common causes in additive, multiplicative and complex noise regimes—and on real data, where we are able to detect known common causes. In addition to detecting latent common causes, our experiments demonstrate that both modified algorithms preserve the performance of the original directed algorithm in distinguishing directed causal relations.
Tasks Causal Discovery, Causal Inference
Published 2019-10-22
URL https://arxiv.org/abs/1910.10174v2
PDF https://arxiv.org/pdf/1910.10174v2.pdf
PWC https://paperswithcode.com/paper/leveraging-directed-causal-discovery-to
Repo
Framework

Joint Representation of Multiple Geometric Priors via a Shape Decomposition Model for Single Monocular 3D Pose Estimation

Title Joint Representation of Multiple Geometric Priors via a Shape Decomposition Model for Single Monocular 3D Pose Estimation
Authors Mengxi Jiang, Zhuliang Yu, Cuihua Li, Yunqi Lei
Abstract In this paper, we aim to recover the 3D human pose from 2D body joints of a single image. The major challenge in this task is the depth ambiguity since different 3D poses may produce similar 2D poses. Although many recent advances in this problem are found in both unsupervised and supervised learning approaches, the performances of most of these approaches are greatly affected by insufficient diversities and richness of training data. To alleviate this issue, we propose an unsupervised learning approach, which is capable of estimating various complex poses well under limited available training data. Specifically, we propose a Shape Decomposition Model (SDM) in which a 3D pose is considered as the superposition of two parts which are global structure together with some deformations. Based on SDM, we estimate these two parts explicitly by solving two sets of different distributed combination coefficients of geometric priors. In addition, to obtain geometric priors, a joint dictionary learning algorithm is proposed to extract both coarse and fine pose clues simultaneously from limited training data. Quantitative evaluations on several widely used datasets demonstrate that our approach yields better performances over other competitive approaches. Especially, on some categories with more complex deformations, significant improvements are achieved by our approach. Furthermore, qualitative experiments conducted on in-the-wild images also show the effectiveness of the proposed approach.
Tasks 3D Pose Estimation, Dictionary Learning, Pose Estimation
Published 2019-05-31
URL https://arxiv.org/abs/1905.13466v1
PDF https://arxiv.org/pdf/1905.13466v1.pdf
PWC https://paperswithcode.com/paper/joint-representation-of-multiple-geometric
Repo
Framework

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

Title Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes
Authors Kenji Kawaguchi, Jiaoyang Huang
Abstract In this paper, we theoretically prove that gradient descent can find a global minimum for nonlinear deep neural networks of sizes commonly encountered in practice. The theory developed in this paper only requires the practical degrees of over-parameterization unlike previous theories. Our theory only requires the number of trainable parameters to increase linearly as the number of training samples increases. This allows the size of the deep neural networks to be consistent with practice and to be several orders of magnitude smaller than that required by the previous theories. Moreover, we prove that the linear increase of the size of the network is the optimal rate and that it cannot be improved, except by a logarithmic factor. Furthermore, deep neural networks with the trainability guarantee are shown to generalize well to unseen test samples with a natural dataset but not a random dataset.
Tasks
Published 2019-08-05
URL https://arxiv.org/abs/1908.02419v2
PDF https://arxiv.org/pdf/1908.02419v2.pdf
PWC https://paperswithcode.com/paper/gradient-descent-finds-global-minima-for
Repo
Framework

PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation

Title PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation
Authors Kang Zhiheng, Li Ning
Abstract With the tide of artificial intelligence, we try to apply deep learning to understand 3D data. Point cloud is an important 3D data structure, which can accurately and directly reflect the real world. In this paper, we propose a simple and effective network, which is named PyramNet, suites for point cloud object classification and semantic segmentation in 3D scene. We design two new operators: Graph Embedding Module(GEM) and Pyramid Attention Network(PAN). Specifically, GEM projects point cloud onto the graph and practices the covariance matrix to explore the relationship between points, so as to improve the local feature expression ability of the model. PAN assigns some strong semantic features to each point to retain fine geometric features as much as possible. Furthermore, we provide extensive evaluation and analysis for the effectiveness of PyramNet. Empirically, we evaluate our model on ModelNet40, ShapeNet and S3DIS.
Tasks Graph Embedding, Object Classification, Semantic Segmentation
Published 2019-06-07
URL https://arxiv.org/abs/1906.03299v2
PDF https://arxiv.org/pdf/1906.03299v2.pdf
PWC https://paperswithcode.com/paper/pyramnet-point-cloud-pyramid-attention
Repo
Framework

Robust Data Detection for MIMO Systems with One-Bit ADCs: A Reinforcement Learning Approach

Title Robust Data Detection for MIMO Systems with One-Bit ADCs: A Reinforcement Learning Approach
Authors Yo-Seb Jeon, Namyoon Lee, H. Vincent Poor
Abstract The use of one-bit analog-to-digital converters (ADCs) at a receiver is a power-efficient solution for future wireless systems operating with a large signal bandwidth and/or a massive number of receive radio frequency chains. This solution, however, induces a high channel estimation error and therefore makes it difficult to perform the optimal data detection that requires perfect knowledge of likelihood functions at the receiver. In this paper, we propose a likelihood function learning method for multiple-input multiple-output (MIMO) systems with one-bit ADCs using a reinforcement learning approach. The key idea is to exploit input-output samples obtained from data detection, to compensate the mismatch in the likelihood function. The underlying difficulty of this idea is a label uncertainty in the samples caused by a data detection error. To resolve this problem, we define a Markov decision process (MDP) to maximize the accuracy of the likelihood function learned from the samples. We then develop a reinforcement learning algorithm that efficiently finds the optimal policy by approximating the transition function and the optimal state of the MDP. Simulation results demonstrate that the proposed method provides significant performance gains for the optimal data detection methods that suffer from the mismatch in the likelihood function.
Tasks
Published 2019-03-29
URL http://arxiv.org/abs/1903.12546v1
PDF http://arxiv.org/pdf/1903.12546v1.pdf
PWC https://paperswithcode.com/paper/robust-data-detection-for-mimo-systems-with
Repo
Framework

Diversifying Topic-Coherent Response Generation for Natural Multi-turn Conversations

Title Diversifying Topic-Coherent Response Generation for Natural Multi-turn Conversations
Authors Fei Hu, Wei Liu, Ajmal Saeed Mian, Li Li
Abstract Although response generation (RG) diversification for single-turn dialogs has been well developed, it is less investigated for natural multi-turn conversations. Besides, past work focused on diversifying responses without considering topic coherence to the context, producing uninformative replies. In this paper, we propose the Topic-coherent Hierarchical Recurrent Encoder-Decoder model (THRED) to diversify the generated responses without deviating the contextual topics for multi-turn conversations. In overall, we build a sequence-to-sequence net (Seq2Seq) to model multi-turn conversations. And then we resort to the latent Variable Hierarchical Recurrent Encoder-Decoder model (VHRED) to learn global contextual distribution of dialogs. Besides, we construct a dense topic matrix which implies word-level correlations of the conversation corpora. The topic matrix is used to learn local topic distribution of the contextual utterances. By incorporating both the global contextual distribution and the local topic distribution, THRED produces both diversified and topic-coherent replies. In addition, we propose an explicit metric (\emph{TopicDiv}) to measure the topic divergence between the post and generated response, and we also propose an overall metric combining the diversification metric (\emph{Distinct}) and \emph{TopicDiv}. We evaluate our model comparing with three baselines (Seq2Seq, HRED and VHRED) on two real-world corpora, respectively, and demonstrate its outstanding performance in both diversification and topic coherence.
Tasks
Published 2019-10-24
URL https://arxiv.org/abs/1910.11161v1
PDF https://arxiv.org/pdf/1910.11161v1.pdf
PWC https://paperswithcode.com/paper/diversifying-topic-coherent-response
Repo
Framework

Privacy-preserving Federated Brain Tumour Segmentation

Title Privacy-preserving Federated Brain Tumour Segmentation
Authors Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M. Jorge Cardoso, Andrew Feng
Abstract Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake. This poses challenges for training machine learning algorithms, such as deep convolutional networks, which often require large numbers of diverse training examples. Federated learning sidesteps this difficulty by bringing code to the patient data owners and only sharing intermediate model training updates among them. Although a high-accuracy model could be achieved by appropriately aggregating these model updates, the model shared could indirectly leak the local training examples. In this paper, we investigate the feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup. We implement and evaluate practical federated learning systems for brain tumour segmentation on the BraTS dataset. The experimental results show that there is a trade-off between model performance and privacy protection costs.
Tasks
Published 2019-10-02
URL https://arxiv.org/abs/1910.00962v1
PDF https://arxiv.org/pdf/1910.00962v1.pdf
PWC https://paperswithcode.com/paper/privacy-preserving-federated-brain-tumour
Repo
Framework

Multilabel Automated Recognition of Emotions Induced Through Music

Title Multilabel Automated Recognition of Emotions Induced Through Music
Authors Fabio Paolizzo, Natalia Pichierri, Daniele Casali, Daniele Giardino, Marco Matta, Giovanni Costantini
Abstract Achieving advancements in automatic recognition of emotions that music can induce require considering multiplicity and simultaneity of emotions. Comparison of different machine learning algorithms performing multilabel and multiclass classification is the core of our work. The study analyzes the implementation of the Geneva Emotional Music Scale 9 in the Emotify music dataset and the data distribution. The research goal is the identification of best methods towards the definition of the audio component of a new a new multimodal dataset for music emotion recognition.
Tasks Emotion Recognition, Music Emotion Recognition
Published 2019-05-29
URL https://arxiv.org/abs/1905.12629v1
PDF https://arxiv.org/pdf/1905.12629v1.pdf
PWC https://paperswithcode.com/paper/multilabel-automated-recognition-of-emotions
Repo
Framework

Oculum afficit: Ocular Affect Recognition

Title Oculum afficit: Ocular Affect Recognition
Authors Elmar Langholz
Abstract Recognizing human affect and emotions is a problem that has a wide range of applications within both academia and industry. Affect and emotion recognition within computer vision primarily relies on images of faces. With the prevalence of portable devices (e.g. smart phones and/or smart glasses),acquiring user facial images requires focus, time, and precision. While existing systems work great for full frontal faces, they tend to not work so well with partial faces like those of the operator of the device when under use. Due to this, we propose a methodology in which we can accurately infer the overall affect of a person by looking at the ocular region of an individual.
Tasks Emotion Recognition
Published 2019-05-22
URL https://arxiv.org/abs/1905.09240v1
PDF https://arxiv.org/pdf/1905.09240v1.pdf
PWC https://paperswithcode.com/paper/oculum-afficit-ocular-affect-recognition
Repo
Framework

Quality-aware Unpaired Image-to-Image Translation

Title Quality-aware Unpaired Image-to-Image Translation
Authors Lei Chen, Le Wu, Zhenzhen Hu, Meng Wang
Abstract Generative Adversarial Networks (GANs) have been widely used for the image-to-image translation task. While these models rely heavily on the labeled image pairs, recently some GAN variants have been proposed to tackle the unpaired image translation task. These models exploited supervision at the domain level with a reconstruction process for unpaired image translation. On the other hand, parallel works have shown that leveraging perceptual loss functions based on high level deep features could enhance the generated image quality. Nevertheless, as these GAN-based models either depended on the pretrained deep network structure or relied on the labeled image pairs, they could not be directly applied to the unpaired image translation task. Moreover, despite the improvement of the introduced perceptual losses from deep neural networks, few researchers have explored the possibility of improving the generated image quality from classical image quality measures. To tackle the above two challenges, in this paper, we propose a unified quality-aware GAN-based framework for unpaired image-to-image translation, where a quality-aware loss is explicitly incorporated by comparing each source image and the reconstructed image at the domain level. Specifically, we design two detailed implementations of the quality loss. The first method is based on a classical image quality assessment measure by defining a classical quality-aware loss. The second method proposes an adaptive deep network based loss. Finally, extensive experimental results on many real-world datasets clearly show the quality improvement of our proposed framework, and the superiority of leveraging classical image quality measures for unpaired image translation compared to the deep network based model.
Tasks Image Quality Assessment, Image-to-Image Translation
Published 2019-03-15
URL http://arxiv.org/abs/1903.06399v1
PDF http://arxiv.org/pdf/1903.06399v1.pdf
PWC https://paperswithcode.com/paper/quality-aware-unpaired-image-to-image
Repo
Framework

Nonlinear Approximation and (Deep) ReLU Networks

Title Nonlinear Approximation and (Deep) ReLU Networks
Authors I. Daubechies, R. DeVore, S. Foucart, B. Hanin, G. Petrova
Abstract This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g. approximation by polynomials or by piecewise polynomials on prescribed partitions. However, approximation by neural networks depending on n parameters is a form of nonlinear approximation and as such should be compared with other nonlinear methods such as variable knot splines or n-term approximation from dictionaries. The performance of neural networks in targeted applications such as machine learning indicate that they actually possess even greater approximation power than these traditional methods of nonlinear approximation. The main results of this article prove that this is indeed the case. This is done by exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task. The present article purposefully limits itself to studying the approximation of univariate functions by ReLU networks. Many generalizations to functions of several variables and other activation functions can be envisioned. However, even in this simplest of settings considered here, a theory that completely quantifies the approximation power of neural networks is still lacking.
Tasks
Published 2019-05-05
URL https://arxiv.org/abs/1905.02199v1
PDF https://arxiv.org/pdf/1905.02199v1.pdf
PWC https://paperswithcode.com/paper/nonlinear-approximation-and-deep-relu
Repo
Framework
comments powered by Disqus