October 17, 2019

2753 words 13 mins read

Paper Group ANR 889

Generative Creativity: Adversarial Learning for Bionic Design. Artistic Object Recognition by Unsupervised Style Adaptation. Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering. On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond. Simplifying Probabilistic Expressions in Causal Infe …

Generative Creativity: Adversarial Learning for Bionic Design


Title	Generative Creativity: Adversarial Learning for Bionic Design
Authors	Simiao Yu, Hao Dong, Pan Wang, Chao Wu, Yike Guo
Abstract	Bionic design refers to an approach of generative creativity in which a target object (e.g. a floor lamp) is designed to contain features of biological source objects (e.g. flowers), resulting in creative biologically-inspired design. In this work, we attempt to model the process of shape-oriented bionic design as follows: given an input image of a design target object, the model generates images that 1) maintain shape features of the input design target image, 2) contain shape features of images from the specified biological source domain, 3) are plausible and diverse. We propose DesignGAN, a novel unsupervised deep generative approach to realising bionic design. Specifically, we employ a conditional Generative Adversarial Networks architecture with several designated losses (an adversarial loss, a regression loss, a cycle loss and a latent loss) that respectively constrict our model to meet the corresponding aforementioned requirements of bionic design modelling. We perform qualitative and quantitative experiments to evaluate our method, and demonstrate that our proposed approach successfully generates creative images of bionic design.
Tasks
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07615v1
PDF	http://arxiv.org/pdf/1805.07615v1.pdf
PWC	https://paperswithcode.com/paper/generative-creativity-adversarial-learning
Repo
Framework

Artistic Object Recognition by Unsupervised Style Adaptation


Title	Artistic Object Recognition by Unsupervised Style Adaptation
Authors	Christopher Thomas, Adriana Kovashka
Abstract	Computer vision systems currently lack the ability to reliably recognize artistically rendered objects, especially when such data is limited. In this paper, we propose a method for recognizing objects in artistic modalities (such as paintings, cartoons, or sketches), without requiring any labeled data from those modalities. Our method explicitly accounts for stylistic domain shifts between and within domains. To do so, we introduce a complementary training modality constructed to be similar in artistic style to the target domain, and enforce that the network learns features that are invariant between the two training modalities. We show how such artificial labeled source domains can be generated automatically through the use of style transfer techniques, using diverse target images to represent the style in the target domain. Unlike existing methods which require a large amount of unlabeled target data, our method can work with as few as ten unlabeled images. We evaluate it on a number of cross-domain object and scene classification tasks and on a new dataset we release. Our experiments show that our approach, though conceptually simple, significantly improves the accuracy that existing domain adaptation techniques obtain for artistic object recognition.
Tasks	Domain Adaptation, Object Recognition, Scene Classification, Style Transfer
Published	2018-12-28
URL	http://arxiv.org/abs/1812.11139v1
PDF	http://arxiv.org/pdf/1812.11139v1.pdf
PWC	https://paperswithcode.com/paper/artistic-object-recognition-by-unsupervised
Repo
Framework

Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering


Title	Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering
Authors	Samuel Scheidegger, Joachim Benjaminsson, Emil Rosenberg, Amrit Krishnan, Karl Granstrom
Abstract	Monocular cameras are one of the most commonly used sensors in the automotive industry for autonomous vehicles. One major drawback using a monocular camera is that it only makes observations in the two dimensional image plane and can not directly measure the distance to objects. In this paper, we aim at filling this gap by developing a multi-object tracking algorithm that takes an image as input and produces trajectories of detected objects in a world coordinate system. We solve this by using a deep neural network trained to detect and estimate the distance to objects from a single input image. The detections from a sequence of images are fed in to a state-of-the art Poisson multi-Bernoulli mixture tracking filter. The combination of the learned detector and the PMBM filter results in an algorithm that achieves 3D tracking using only mono-camera images as input. The performance of the algorithm is evaluated both in 3D world coordinates, and 2D image coordinates, using the publicly available KITTI object tracking dataset. The algorithm shows the ability to accurately track objects, correctly handle data associations, even when there is a big overlap of the objects in the image, and is one of the top performing algorithms on the KITTI object tracking benchmark. Furthermore, the algorithm is efficient, running on average close to 20 frames per second.
Tasks	3D Multi-Object Tracking, Autonomous Vehicles, Multi-Object Tracking, Object Tracking
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09975v1
PDF	http://arxiv.org/pdf/1802.09975v1.pdf
PWC	https://paperswithcode.com/paper/mono-camera-3d-multi-object-tracking-using
Repo
Framework

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond


Title	On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond
Authors	Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao
Abstract	We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks. Through introducing a new characterization of the Lipschitz properties of neural network family, we achieve significantly tighter generalization bounds than existing results. Moreover, we show that the generalization bound can be further improved for bounded losses. Aside from the general feedforward deep neural networks, our results can be applied to derive new bounds for popular architectures, including convolutional neural networks (CNNs) and residual networks (ResNets). When achieving same generalization errors with previous arts, our bounds allow for the choice of larger parameter spaces of weight matrices, inducing potentially stronger expressive ability for neural networks. Numerical evaluation is also provided to support our theory.
Tasks
Published	2018-06-13
URL	https://arxiv.org/abs/1806.05159v4
PDF	https://arxiv.org/pdf/1806.05159v4.pdf
PWC	https://paperswithcode.com/paper/on-tighter-generalization-bound-for-deep
Repo
Framework

Simplifying Probabilistic Expressions in Causal Inference


Title	Simplifying Probabilistic Expressions in Causal Inference
Authors	Santtu Tikka, Juha Karvanen
Abstract	Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the R package causaleffect.
Tasks	Causal Inference
Published	2018-06-19
URL	http://arxiv.org/abs/1806.07082v1
PDF	http://arxiv.org/pdf/1806.07082v1.pdf
PWC	https://paperswithcode.com/paper/simplifying-probabilistic-expressions-in
Repo
Framework

Learnable Image Encryption


Title	Learnable Image Encryption
Authors	Masayuki Tanaka
Abstract	The network-based machine learning algorithm is very powerful tools. However, it requires huge training dataset. Researchers often meet privacy issues when they collect image dataset especially for surveillance applications. A learnable image encryption scheme is introduced. The key idea of this scheme is to encrypt images, so that human cannot understand images but the network can be train with encrypted images. This scheme allows us to train the network without the privacy issues. In this paper, a simple learnable image encryption algorithm is proposed. Then, the proposed algorithm is validated with cifar dataset.
Tasks
Published	2018-03-19
URL	http://arxiv.org/abs/1804.00490v1
PDF	http://arxiv.org/pdf/1804.00490v1.pdf
PWC	https://paperswithcode.com/paper/learnable-image-encryption
Repo
Framework

Histogram Transform-based Speaker Identification


Title	Histogram Transform-based Speaker Identification
Authors	Zhanyu Ma, Hong Yu
Abstract	A novel text-independent speaker identification (SI) method is proposed. This method uses the Mel-frequency Cepstral coefficients (MFCCs) and the dynamic information among adjacent frames as feature sets to capture speaker’s characteristics. In order to utilize dynamic information, we design super-MFCCs features by cascading three neighboring MFCCs frames together. The probability density function (PDF) of these super-MFCCs features is estimated by the recently proposed histogram transform~(HT) method, which generates more training data by random transforms to realize the histogram PDF estimation and recedes the commonly occurred discontinuity problem in multivariate histograms computing. Compared to the conventional PDF estimation methods, such as Gaussian mixture models, the HT model shows promising improvement in the SI performance.
Tasks	Speaker Identification
Published	2018-08-02
URL	https://arxiv.org/abs/1808.00959v2
PDF	https://arxiv.org/pdf/1808.00959v2.pdf
PWC	https://paperswithcode.com/paper/histogram-transform-based-speaker
Repo
Framework

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress


Title	A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Authors	Saurabh Arora, Prashant Doshi
Abstract	Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.
Tasks
Published	2018-06-18
URL	https://arxiv.org/abs/1806.06877v2
PDF	https://arxiv.org/pdf/1806.06877v2.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-inverse-reinforcement-learning
Repo
Framework

Weakly Supervised Training of Speaker Identification Models


Title	Weakly Supervised Training of Speaker Identification Models
Authors	Martin Karu, Tanel Alumäe
Abstract	We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the recording level. The method uses speaker diarization to find unique speakers in each recording, and i-vectors to project the speech of each speaker to a fixed-dimensional vector. A neural network is then trained to map i-vectors to speakers, using a special objective function that allows to optimize the model using recording-level speaker labels. We report experiments on two different real-world datasets. On the VoxCeleb dataset, the method provides 94.6% accuracy on a closed set speaker identification task, surpassing the baseline performance by a large margin. On an Estonian broadcast news dataset, the method provides 66% time-weighted speaker identification recall at 93% precision.
Tasks	Speaker Diarization, Speaker Identification
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08621v1
PDF	http://arxiv.org/pdf/1806.08621v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-training-of-speaker
Repo
Framework

Comparison of RNN Encoder-Decoder Models for Anomaly Detection


Title	Comparison of RNN Encoder-Decoder Models for Anomaly Detection
Authors	YeongHyeon Park, Il Dong Yun
Abstract	In this paper, we compare different types of Recurrent Neural Network (RNN) Encoder-Decoders in anomaly detection viewpoint. We focused on finding the model that can learn the same data more effectively. We compared multiple models under the same conditions, such as the number of parameters, optimizer, and learning rate. However, the difference is whether to predict the future sequence or restore the current sequence. We constructed the dataset with simple vectors and used them for the experiment. Finally, we experimentally confirmed that the model performs better when the model restores the current sequence, rather than predict the future sequence.
Tasks	Anomaly Detection
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06576v2
PDF	http://arxiv.org/pdf/1807.06576v2.pdf
PWC	https://paperswithcode.com/paper/comparison-of-rnn-encoder-decoder-models-for
Repo
Framework

Random Occlusion-recovery for Person Re-identification


Title	Random Occlusion-recovery for Person Re-identification
Authors	Di Wu, Kun Zhang, Fei Cheng, Yang Zhao, Qi Liu, Chang-An Yuan, De-Shuang Huang
Abstract	As a basic task of multi-camera surveillance system, person re-identification aims to re-identify a query pedestrian observed from non-overlapping multiple cameras or across different time with a single camera. Recently, deep learning-based person re-identification models have achieved great success in many benchmarks. However, these supervised models require a large amount of labeled image data, and the process of manual labeling spends much manpower and time. In this study, we introduce a method to automatically synthesize labeled person images and adopt them to increase the sample number per identity for person re-identification datasets. To be specific, we use block rectangles to randomly occlude pedestrian images. Then, a generative adversarial network (GAN) model is proposed to use paired occluded and original images to synthesize the de-occluded images that similar but not identical to the original image. Afterwards, we annotate the de-occluded images with the same labels of their corresponding raw images and use them to augment the number of samples per identity. Finally, we use the augmented datasets to train baseline model. The experiment results on CUHK03, Market-1501 and DukeMTMC-reID datasets show that the effectiveness of the proposed method.
Tasks	Person Re-Identification
Published	2018-09-26
URL	http://arxiv.org/abs/1809.09970v3
PDF	http://arxiv.org/pdf/1809.09970v3.pdf
PWC	https://paperswithcode.com/paper/random-occlusion-recovery-for-person-re
Repo
Framework

A Factorial Mixture Prior for Compositional Deep Generative Models


Title	A Factorial Mixture Prior for Compositional Deep Generative Models
Authors	Ulrich Paquet, Sumedh K. Ghaisas, Olivier Tieleman
Abstract	We assume that a high-dimensional datum, like an image, is a compositional expression of a set of properties, with a complicated non-linear relationship between the datum and its properties. This paper proposes a factorial mixture prior for capturing latent properties, thereby adding structured compositionality to deep generative models. The prior treats a latent vector as belonging to Cartesian product of subspaces, each of which is quantized separately with a Gaussian mixture model. Some mixture components can be set to represent properties as observed random variables whenever labeled properties are present. Through a combination of stochastic variational inference and gradient descent, a method for learning how to infer discrete properties in an unsupervised or semi-supervised way is outlined and empirically evaluated.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07480v1
PDF	http://arxiv.org/pdf/1812.07480v1.pdf
PWC	https://paperswithcode.com/paper/a-factorial-mixture-prior-for-compositional
Repo
Framework

MLE-induced Likelihood for Markov Random Fields


Title	MLE-induced Likelihood for Markov Random Fields
Authors	Jie Liu, Hao Zheng
Abstract	Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09887v1
PDF	http://arxiv.org/pdf/1803.09887v1.pdf
PWC	https://paperswithcode.com/paper/mle-induced-likelihood-for-markov-random
Repo
Framework

Deep Learning Models Delineates Multiple Nuclear Phenotypes in H&E Stained Histology Sections


Title	Deep Learning Models Delineates Multiple Nuclear Phenotypes in H&E Stained Histology Sections
Authors	Mina Khoshdeli, Bahram Parvin
Abstract	Nuclear segmentation is an important step for profiling aberrant regions of histology sections. However, segmentation is a complex problem as a result of variations in nuclear geometry (e.g., size, shape), nuclear type (e.g., epithelial, fibroblast), and nuclear phenotypes (e.g., vesicular, aneuploidy). The problem is further complicated as a result of variations in sample preparation. It is shown and validated that fusion of very deep convolutional networks overcomes (i) complexities associated with multiple nuclear phenotypes, and (ii) separation of overlapping nuclei. The fusion relies on integrating of networks that learn region- and boundary-based representations. The system has been validated on a diverse set of nuclear phenotypes that correspond to the breast and brain histology sections.
Tasks	Nuclear Segmentation
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04427v2
PDF	http://arxiv.org/pdf/1802.04427v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-models-delineates-multiple
Repo
Framework

Multi-view Sentence Representation Learning


Title	Multi-view Sentence Representation Learning
Authors	Shuai Tang, Virginia R. de Sa
Abstract	Multi-view learning can provide self-supervision when different views are available of the same data. The distributional hypothesis provides another form of useful self-supervision from adjacent sentences which are plentiful in large unlabelled corpora. Motivated by the asymmetry in the two hemispheres of the human brain as well as the observation that different learning architectures tend to emphasise different aspects of sentence meaning, we create a unified multi-view sentence representation learning framework, in which, one view encodes the input sentence with a Recurrent Neural Network (RNN), and the other view encodes it with a simple linear model, and the training objective is to maximise the agreement specified by the adjacent context information between two views. We show that, after training, the vectors produced from our multi-view training provide improved representations over the single-view training, and the combination of different views gives further representational improvement and demonstrates solid transferability on standard downstream tasks.
Tasks	MULTI-VIEW LEARNING, Representation Learning
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07443v1
PDF	http://arxiv.org/pdf/1805.07443v1.pdf
PWC	https://paperswithcode.com/paper/multi-view-sentence-representation-learning
Repo
Framework