January 25, 2020

3196 words 16 mins read

Paper Group ANR 1732

Empirical Study of Easy and Hard Examples in CNN Training. Improving benchmarks for autonomous vehicles testing using synthetically generated images. Skin Lesion Classification Using Deep Neural Network. Two-Stream Video Classification with Cross-Modality Attention. Leveraging directed causal discovery to detect latent common causes. Joint Represen …

Empirical Study of Easy and Hard Examples in CNN Training


Title	Empirical Study of Easy and Hard Examples in CNN Training
Authors	Ikki Kishida, Hideki Nakayama
Abstract	Deep Neural Networks (DNNs) generalize well despite their massive size and capability of memorizing all examples. There is a hypothesis that DNNs start learning from simple patterns and the hypothesis is based on the existence of examples that are consistently well-classified at the early training stage (i.e., easy examples) and examples misclassified (i.e., hard examples). Easy examples are the evidence that DNNs start learning from specific patterns and there is a consistent learning process. It is important to know how DNNs learn patterns and obtain generalization ability, however, properties of easy and hard examples are not thoroughly investigated (e.g., contributions to generalization and visual appearances). In this work, we study the similarities of easy and hard examples respectively for different Convolutional Neural Network (CNN) architectures, assessing how those examples contribute to generalization. Our results show that easy examples are visually similar to each other and hard examples are visually diverse, and both examples are largely shared across different CNN architectures. Moreover, while hard examples tend to contribute more to generalization than easy examples, removing a large number of easy examples leads to poor generalization. By analyzing those results, we hypothesize that biases in a dataset and Stochastic Gradient Descent (SGD) are the reasons why CNNs have consistent easy and hard examples. Furthermore, we show that large scale classification datasets can be efficiently compressed by using easiness proposed in this work.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10739v1
PDF	https://arxiv.org/pdf/1911.10739v1.pdf
PWC	https://paperswithcode.com/paper/empirical-study-of-easy-and-hard-examples-in-1
Repo
Framework

Improving benchmarks for autonomous vehicles testing using synthetically generated images


Title	Improving benchmarks for autonomous vehicles testing using synthetically generated images
Authors	Aleksander Lukashou
Abstract	Nowadays autonomous technologies are a very heavily explored area and particularly computer vision as the main component of vehicle perception. The quality of the whole vision system based on neural networks relies on the dataset it was trained on. It is extremely difficult to find traffic sign datasets from most of the counties of the world. Meaning autonomous vehicle from the USA will not be able to drive though Lithuania recognizing all road signs on the way. In this paper, we propose a solution on how to update model using a small dataset from the country vehicle will be used in. It is important to mention that is not panacea, rather small upgrade which can boost autonomous car development in countries with limited data access. We achieved about 10 percent quality raise and expect even better results during future experiments.
Tasks	Autonomous Vehicles
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10261v1
PDF	http://arxiv.org/pdf/1904.10261v1.pdf
PWC	https://paperswithcode.com/paper/improving-benchmarks-for-autonomous-vehicles
Repo
Framework

Skin Lesion Classification Using Deep Neural Network


Title	Skin Lesion Classification Using Deep Neural Network
Authors	Alla Eddine Guissous
Abstract	This paper reports the methods and techniques we have developed for classify dermoscopic images (task 1) of the ISIC 2019 challenge dataset for skin lesion classification, our approach aims to use ensemble deep neural network with some powerful techniques to deal with unbalance data sets as its the main problem for this challenge in a move to increase the performance of CNNs model.
Tasks	Skin Lesion Classification
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07817v1
PDF	https://arxiv.org/pdf/1911.07817v1.pdf
PWC	https://paperswithcode.com/paper/skin-lesion-classification-using-deep-neural
Repo
Framework

Two-Stream Video Classification with Cross-Modality Attention


Title	Two-Stream Video Classification with Cross-Modality Attention
Authors	Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian
Abstract	Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream’s prediction scores at the last stage. A valid question is whether there exists a more effective method to fuse information cross modality. With the development of attention mechanism in natural language processing, there emerge many successful applications of attention in the field of computer vision. In this paper, we propose a cross-modality attention operation, which can obtain information from other modality in a more effective way than two-stream. Correspondingly we implement a compatible block named CMA block, which is a wrapper of our proposed attention operation. CMA can be plugged into many existing architectures. In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification. All experiments clearly demonstrate strong performance superiority by our proposed method. We also analyze the advantages of the CMA block by visualizing the attention map, which intuitively shows how the block helps the final prediction.
Tasks	Action Classification, Action Recognition In Videos, Video Classification
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00497v1
PDF	https://arxiv.org/pdf/1908.00497v1.pdf
PWC	https://paperswithcode.com/paper/two-stream-video-classification-with-cross
Repo
Framework

Leveraging directed causal discovery to detect latent common causes


Title	Leveraging directed causal discovery to detect latent common causes
Authors	Ciarán M. Lee, Christopher Hart, Jonathan G. Richens, Saurabh Johri
Abstract	The discovery of causal relationships is a fundamental problem in science and medicine. In recent years, many elegant approaches to discovering causal relationships between two variables from uncontrolled data have been proposed. However, most of these deal only with purely directed causal relationships and cannot detect latent common causes. Here, we devise a general method which takes a purely directed causal discovery algorithm and modifies it so that it can also detect latent common causes. The identifiability of the modified algorithm depends on the identifiability of the original, as well as an assumption that the strength of noise be relatively small. We apply our method to two directed causal discovery algorithms, the Information Geometric Causal Inference of (Daniusis et al., 2010) and the Kernel Conditional Deviance for Causal Inference of (Mitrovic, Sejdinovic, and Teh, 2018), and extensively test on synthetic data—detecting latent common causes in additive, multiplicative and complex noise regimes—and on real data, where we are able to detect known common causes. In addition to detecting latent common causes, our experiments demonstrate that both modified algorithms preserve the performance of the original directed algorithm in distinguishing directed causal relations.
Tasks	Causal Discovery, Causal Inference
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10174v2
PDF	https://arxiv.org/pdf/1910.10174v2.pdf
PWC	https://paperswithcode.com/paper/leveraging-directed-causal-discovery-to
Repo
Framework

Joint Representation of Multiple Geometric Priors via a Shape Decomposition Model for Single Monocular 3D Pose Estimation


Title	Joint Representation of Multiple Geometric Priors via a Shape Decomposition Model for Single Monocular 3D Pose Estimation
Authors	Mengxi Jiang, Zhuliang Yu, Cuihua Li, Yunqi Lei
Abstract	In this paper, we aim to recover the 3D human pose from 2D body joints of a single image. The major challenge in this task is the depth ambiguity since different 3D poses may produce similar 2D poses. Although many recent advances in this problem are found in both unsupervised and supervised learning approaches, the performances of most of these approaches are greatly affected by insufficient diversities and richness of training data. To alleviate this issue, we propose an unsupervised learning approach, which is capable of estimating various complex poses well under limited available training data. Specifically, we propose a Shape Decomposition Model (SDM) in which a 3D pose is considered as the superposition of two parts which are global structure together with some deformations. Based on SDM, we estimate these two parts explicitly by solving two sets of different distributed combination coefficients of geometric priors. In addition, to obtain geometric priors, a joint dictionary learning algorithm is proposed to extract both coarse and fine pose clues simultaneously from limited training data. Quantitative evaluations on several widely used datasets demonstrate that our approach yields better performances over other competitive approaches. Especially, on some categories with more complex deformations, significant improvements are achieved by our approach. Furthermore, qualitative experiments conducted on in-the-wild images also show the effectiveness of the proposed approach.
Tasks	3D Pose Estimation, Dictionary Learning, Pose Estimation
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13466v1
PDF	https://arxiv.org/pdf/1905.13466v1.pdf
PWC	https://paperswithcode.com/paper/joint-representation-of-multiple-geometric
Repo
Framework

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes


Title	Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes
Authors	Kenji Kawaguchi, Jiaoyang Huang
Abstract	In this paper, we theoretically prove that gradient descent can find a global minimum for nonlinear deep neural networks of sizes commonly encountered in practice. The theory developed in this paper only requires the practical degrees of over-parameterization unlike previous theories. Our theory only requires the number of trainable parameters to increase linearly as the number of training samples increases. This allows the size of the deep neural networks to be consistent with practice and to be several orders of magnitude smaller than that required by the previous theories. Moreover, we prove that the linear increase of the size of the network is the optimal rate and that it cannot be improved, except by a logarithmic factor. Furthermore, deep neural networks with the trainability guarantee are shown to generalize well to unseen test samples with a natural dataset but not a random dataset.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.02419v2
PDF	https://arxiv.org/pdf/1908.02419v2.pdf
PWC	https://paperswithcode.com/paper/gradient-descent-finds-global-minima-for
Repo
Framework

PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation


Title	PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation
Authors	Kang Zhiheng, Li Ning
Abstract	With the tide of artificial intelligence, we try to apply deep learning to understand 3D data. Point cloud is an important 3D data structure, which can accurately and directly reflect the real world. In this paper, we propose a simple and effective network, which is named PyramNet, suites for point cloud object classification and semantic segmentation in 3D scene. We design two new operators: Graph Embedding Module(GEM) and Pyramid Attention Network(PAN). Specifically, GEM projects point cloud onto the graph and practices the covariance matrix to explore the relationship between points, so as to improve the local feature expression ability of the model. PAN assigns some strong semantic features to each point to retain fine geometric features as much as possible. Furthermore, we provide extensive evaluation and analysis for the effectiveness of PyramNet. Empirically, we evaluate our model on ModelNet40, ShapeNet and S3DIS.
Tasks	Graph Embedding, Object Classification, Semantic Segmentation
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03299v2
PDF	https://arxiv.org/pdf/1906.03299v2.pdf
PWC	https://paperswithcode.com/paper/pyramnet-point-cloud-pyramid-attention
Repo
Framework

Robust Data Detection for MIMO Systems with One-Bit ADCs: A Reinforcement Learning Approach


Title	Robust Data Detection for MIMO Systems with One-Bit ADCs: A Reinforcement Learning Approach
Authors	Yo-Seb Jeon, Namyoon Lee, H. Vincent Poor
Abstract	The use of one-bit analog-to-digital converters (ADCs) at a receiver is a power-efficient solution for future wireless systems operating with a large signal bandwidth and/or a massive number of receive radio frequency chains. This solution, however, induces a high channel estimation error and therefore makes it difficult to perform the optimal data detection that requires perfect knowledge of likelihood functions at the receiver. In this paper, we propose a likelihood function learning method for multiple-input multiple-output (MIMO) systems with one-bit ADCs using a reinforcement learning approach. The key idea is to exploit input-output samples obtained from data detection, to compensate the mismatch in the likelihood function. The underlying difficulty of this idea is a label uncertainty in the samples caused by a data detection error. To resolve this problem, we define a Markov decision process (MDP) to maximize the accuracy of the likelihood function learned from the samples. We then develop a reinforcement learning algorithm that efficiently finds the optimal policy by approximating the transition function and the optimal state of the MDP. Simulation results demonstrate that the proposed method provides significant performance gains for the optimal data detection methods that suffer from the mismatch in the likelihood function.
Tasks
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12546v1
PDF	http://arxiv.org/pdf/1903.12546v1.pdf
PWC	https://paperswithcode.com/paper/robust-data-detection-for-mimo-systems-with
Repo
Framework

Diversifying Topic-Coherent Response Generation for Natural Multi-turn Conversations


Title	Diversifying Topic-Coherent Response Generation for Natural Multi-turn Conversations
Authors	Fei Hu, Wei Liu, Ajmal Saeed Mian, Li Li
Abstract	Although response generation (RG) diversification for single-turn dialogs has been well developed, it is less investigated for natural multi-turn conversations. Besides, past work focused on diversifying responses without considering topic coherence to the context, producing uninformative replies. In this paper, we propose the Topic-coherent Hierarchical Recurrent Encoder-Decoder model (THRED) to diversify the generated responses without deviating the contextual topics for multi-turn conversations. In overall, we build a sequence-to-sequence net (Seq2Seq) to model multi-turn conversations. And then we resort to the latent Variable Hierarchical Recurrent Encoder-Decoder model (VHRED) to learn global contextual distribution of dialogs. Besides, we construct a dense topic matrix which implies word-level correlations of the conversation corpora. The topic matrix is used to learn local topic distribution of the contextual utterances. By incorporating both the global contextual distribution and the local topic distribution, THRED produces both diversified and topic-coherent replies. In addition, we propose an explicit metric (\emph{TopicDiv}) to measure the topic divergence between the post and generated response, and we also propose an overall metric combining the diversification metric (\emph{Distinct}) and \emph{TopicDiv}. We evaluate our model comparing with three baselines (Seq2Seq, HRED and VHRED) on two real-world corpora, respectively, and demonstrate its outstanding performance in both diversification and topic coherence.
Tasks
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11161v1
PDF	https://arxiv.org/pdf/1910.11161v1.pdf
PWC	https://paperswithcode.com/paper/diversifying-topic-coherent-response
Repo
Framework

Privacy-preserving Federated Brain Tumour Segmentation


Title	Privacy-preserving Federated Brain Tumour Segmentation
Authors	Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M. Jorge Cardoso, Andrew Feng
Abstract	Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake. This poses challenges for training machine learning algorithms, such as deep convolutional networks, which often require large numbers of diverse training examples. Federated learning sidesteps this difficulty by bringing code to the patient data owners and only sharing intermediate model training updates among them. Although a high-accuracy model could be achieved by appropriately aggregating these model updates, the model shared could indirectly leak the local training examples. In this paper, we investigate the feasibility of applying differential-privacy techniques to protect the patient data in a federated learning setup. We implement and evaluate practical federated learning systems for brain tumour segmentation on the BraTS dataset. The experimental results show that there is a trade-off between model performance and privacy protection costs.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00962v1
PDF	https://arxiv.org/pdf/1910.00962v1.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-federated-brain-tumour
Repo
Framework

Multilabel Automated Recognition of Emotions Induced Through Music


Title	Multilabel Automated Recognition of Emotions Induced Through Music
Authors	Fabio Paolizzo, Natalia Pichierri, Daniele Casali, Daniele Giardino, Marco Matta, Giovanni Costantini
Abstract	Achieving advancements in automatic recognition of emotions that music can induce require considering multiplicity and simultaneity of emotions. Comparison of different machine learning algorithms performing multilabel and multiclass classification is the core of our work. The study analyzes the implementation of the Geneva Emotional Music Scale 9 in the Emotify music dataset and the data distribution. The research goal is the identification of best methods towards the definition of the audio component of a new a new multimodal dataset for music emotion recognition.
Tasks	Emotion Recognition, Music Emotion Recognition
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12629v1
PDF	https://arxiv.org/pdf/1905.12629v1.pdf
PWC	https://paperswithcode.com/paper/multilabel-automated-recognition-of-emotions
Repo
Framework

Oculum afficit: Ocular Affect Recognition


Title	Oculum afficit: Ocular Affect Recognition
Authors	Elmar Langholz
Abstract	Recognizing human affect and emotions is a problem that has a wide range of applications within both academia and industry. Affect and emotion recognition within computer vision primarily relies on images of faces. With the prevalence of portable devices (e.g. smart phones and/or smart glasses),acquiring user facial images requires focus, time, and precision. While existing systems work great for full frontal faces, they tend to not work so well with partial faces like those of the operator of the device when under use. Due to this, we propose a methodology in which we can accurately infer the overall affect of a person by looking at the ocular region of an individual.
Tasks	Emotion Recognition
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09240v1
PDF	https://arxiv.org/pdf/1905.09240v1.pdf
PWC	https://paperswithcode.com/paper/oculum-afficit-ocular-affect-recognition
Repo
Framework

Quality-aware Unpaired Image-to-Image Translation


Title	Quality-aware Unpaired Image-to-Image Translation
Authors	Lei Chen, Le Wu, Zhenzhen Hu, Meng Wang
Abstract	Generative Adversarial Networks (GANs) have been widely used for the image-to-image translation task. While these models rely heavily on the labeled image pairs, recently some GAN variants have been proposed to tackle the unpaired image translation task. These models exploited supervision at the domain level with a reconstruction process for unpaired image translation. On the other hand, parallel works have shown that leveraging perceptual loss functions based on high level deep features could enhance the generated image quality. Nevertheless, as these GAN-based models either depended on the pretrained deep network structure or relied on the labeled image pairs, they could not be directly applied to the unpaired image translation task. Moreover, despite the improvement of the introduced perceptual losses from deep neural networks, few researchers have explored the possibility of improving the generated image quality from classical image quality measures. To tackle the above two challenges, in this paper, we propose a unified quality-aware GAN-based framework for unpaired image-to-image translation, where a quality-aware loss is explicitly incorporated by comparing each source image and the reconstructed image at the domain level. Specifically, we design two detailed implementations of the quality loss. The first method is based on a classical image quality assessment measure by defining a classical quality-aware loss. The second method proposes an adaptive deep network based loss. Finally, extensive experimental results on many real-world datasets clearly show the quality improvement of our proposed framework, and the superiority of leveraging classical image quality measures for unpaired image translation compared to the deep network based model.
Tasks	Image Quality Assessment, Image-to-Image Translation
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06399v1
PDF	http://arxiv.org/pdf/1903.06399v1.pdf
PWC	https://paperswithcode.com/paper/quality-aware-unpaired-image-to-image
Repo
Framework

Nonlinear Approximation and (Deep) ReLU Networks


Title	Nonlinear Approximation and (Deep) ReLU Networks
Authors	I. Daubechies, R. DeVore, S. Foucart, B. Hanin, G. Petrova
Abstract	This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g. approximation by polynomials or by piecewise polynomials on prescribed partitions. However, approximation by neural networks depending on n parameters is a form of nonlinear approximation and as such should be compared with other nonlinear methods such as variable knot splines or n-term approximation from dictionaries. The performance of neural networks in targeted applications such as machine learning indicate that they actually possess even greater approximation power than these traditional methods of nonlinear approximation. The main results of this article prove that this is indeed the case. This is done by exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task. The present article purposefully limits itself to studying the approximation of univariate functions by ReLU networks. Many generalizations to functions of several variables and other activation functions can be envisioned. However, even in this simplest of settings considered here, a theory that completely quantifies the approximation power of neural networks is still lacking.
Tasks
Published	2019-05-05
URL	https://arxiv.org/abs/1905.02199v1
PDF	https://arxiv.org/pdf/1905.02199v1.pdf
PWC	https://paperswithcode.com/paper/nonlinear-approximation-and-deep-relu
Repo
Framework