Paper Group ANR 889
Generative Creativity: Adversarial Learning for Bionic Design. Artistic Object Recognition by Unsupervised Style Adaptation. Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering. On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond. Simplifying Probabilistic Expressions in Causal Infe …
Generative Creativity: Adversarial Learning for Bionic Design
Title | Generative Creativity: Adversarial Learning for Bionic Design |
Authors | Simiao Yu, Hao Dong, Pan Wang, Chao Wu, Yike Guo |
Abstract | Bionic design refers to an approach of generative creativity in which a target object (e.g. a floor lamp) is designed to contain features of biological source objects (e.g. flowers), resulting in creative biologically-inspired design. In this work, we attempt to model the process of shape-oriented bionic design as follows: given an input image of a design target object, the model generates images that 1) maintain shape features of the input design target image, 2) contain shape features of images from the specified biological source domain, 3) are plausible and diverse. We propose DesignGAN, a novel unsupervised deep generative approach to realising bionic design. Specifically, we employ a conditional Generative Adversarial Networks architecture with several designated losses (an adversarial loss, a regression loss, a cycle loss and a latent loss) that respectively constrict our model to meet the corresponding aforementioned requirements of bionic design modelling. We perform qualitative and quantitative experiments to evaluate our method, and demonstrate that our proposed approach successfully generates creative images of bionic design. |
Tasks | |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07615v1 |
http://arxiv.org/pdf/1805.07615v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-creativity-adversarial-learning |
Repo | |
Framework | |
Artistic Object Recognition by Unsupervised Style Adaptation
Title | Artistic Object Recognition by Unsupervised Style Adaptation |
Authors | Christopher Thomas, Adriana Kovashka |
Abstract | Computer vision systems currently lack the ability to reliably recognize artistically rendered objects, especially when such data is limited. In this paper, we propose a method for recognizing objects in artistic modalities (such as paintings, cartoons, or sketches), without requiring any labeled data from those modalities. Our method explicitly accounts for stylistic domain shifts between and within domains. To do so, we introduce a complementary training modality constructed to be similar in artistic style to the target domain, and enforce that the network learns features that are invariant between the two training modalities. We show how such artificial labeled source domains can be generated automatically through the use of style transfer techniques, using diverse target images to represent the style in the target domain. Unlike existing methods which require a large amount of unlabeled target data, our method can work with as few as ten unlabeled images. We evaluate it on a number of cross-domain object and scene classification tasks and on a new dataset we release. Our experiments show that our approach, though conceptually simple, significantly improves the accuracy that existing domain adaptation techniques obtain for artistic object recognition. |
Tasks | Domain Adaptation, Object Recognition, Scene Classification, Style Transfer |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.11139v1 |
http://arxiv.org/pdf/1812.11139v1.pdf | |
PWC | https://paperswithcode.com/paper/artistic-object-recognition-by-unsupervised |
Repo | |
Framework | |
Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering
Title | Mono-Camera 3D Multi-Object Tracking Using Deep Learning Detections and PMBM Filtering |
Authors | Samuel Scheidegger, Joachim Benjaminsson, Emil Rosenberg, Amrit Krishnan, Karl Granstrom |
Abstract | Monocular cameras are one of the most commonly used sensors in the automotive industry for autonomous vehicles. One major drawback using a monocular camera is that it only makes observations in the two dimensional image plane and can not directly measure the distance to objects. In this paper, we aim at filling this gap by developing a multi-object tracking algorithm that takes an image as input and produces trajectories of detected objects in a world coordinate system. We solve this by using a deep neural network trained to detect and estimate the distance to objects from a single input image. The detections from a sequence of images are fed in to a state-of-the art Poisson multi-Bernoulli mixture tracking filter. The combination of the learned detector and the PMBM filter results in an algorithm that achieves 3D tracking using only mono-camera images as input. The performance of the algorithm is evaluated both in 3D world coordinates, and 2D image coordinates, using the publicly available KITTI object tracking dataset. The algorithm shows the ability to accurately track objects, correctly handle data associations, even when there is a big overlap of the objects in the image, and is one of the top performing algorithms on the KITTI object tracking benchmark. Furthermore, the algorithm is efficient, running on average close to 20 frames per second. |
Tasks | 3D Multi-Object Tracking, Autonomous Vehicles, Multi-Object Tracking, Object Tracking |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09975v1 |
http://arxiv.org/pdf/1802.09975v1.pdf | |
PWC | https://paperswithcode.com/paper/mono-camera-3d-multi-object-tracking-using |
Repo | |
Framework | |
On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond
Title | On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond |
Authors | Xingguo Li, Junwei Lu, Zhaoran Wang, Jarvis Haupt, Tuo Zhao |
Abstract | We establish a margin based data dependent generalization error bound for a general family of deep neural networks in terms of the depth and width, as well as the Jacobian of the networks. Through introducing a new characterization of the Lipschitz properties of neural network family, we achieve significantly tighter generalization bounds than existing results. Moreover, we show that the generalization bound can be further improved for bounded losses. Aside from the general feedforward deep neural networks, our results can be applied to derive new bounds for popular architectures, including convolutional neural networks (CNNs) and residual networks (ResNets). When achieving same generalization errors with previous arts, our bounds allow for the choice of larger parameter spaces of weight matrices, inducing potentially stronger expressive ability for neural networks. Numerical evaluation is also provided to support our theory. |
Tasks | |
Published | 2018-06-13 |
URL | https://arxiv.org/abs/1806.05159v4 |
https://arxiv.org/pdf/1806.05159v4.pdf | |
PWC | https://paperswithcode.com/paper/on-tighter-generalization-bound-for-deep |
Repo | |
Framework | |
Simplifying Probabilistic Expressions in Causal Inference
Title | Simplifying Probabilistic Expressions in Causal Inference |
Authors | Santtu Tikka, Juha Karvanen |
Abstract | Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the R package causaleffect. |
Tasks | Causal Inference |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07082v1 |
http://arxiv.org/pdf/1806.07082v1.pdf | |
PWC | https://paperswithcode.com/paper/simplifying-probabilistic-expressions-in |
Repo | |
Framework | |
Learnable Image Encryption
Title | Learnable Image Encryption |
Authors | Masayuki Tanaka |
Abstract | The network-based machine learning algorithm is very powerful tools. However, it requires huge training dataset. Researchers often meet privacy issues when they collect image dataset especially for surveillance applications. A learnable image encryption scheme is introduced. The key idea of this scheme is to encrypt images, so that human cannot understand images but the network can be train with encrypted images. This scheme allows us to train the network without the privacy issues. In this paper, a simple learnable image encryption algorithm is proposed. Then, the proposed algorithm is validated with cifar dataset. |
Tasks | |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1804.00490v1 |
http://arxiv.org/pdf/1804.00490v1.pdf | |
PWC | https://paperswithcode.com/paper/learnable-image-encryption |
Repo | |
Framework | |
Histogram Transform-based Speaker Identification
Title | Histogram Transform-based Speaker Identification |
Authors | Zhanyu Ma, Hong Yu |
Abstract | A novel text-independent speaker identification (SI) method is proposed. This method uses the Mel-frequency Cepstral coefficients (MFCCs) and the dynamic information among adjacent frames as feature sets to capture speaker’s characteristics. In order to utilize dynamic information, we design super-MFCCs features by cascading three neighboring MFCCs frames together. The probability density function (PDF) of these super-MFCCs features is estimated by the recently proposed histogram transform~(HT) method, which generates more training data by random transforms to realize the histogram PDF estimation and recedes the commonly occurred discontinuity problem in multivariate histograms computing. Compared to the conventional PDF estimation methods, such as Gaussian mixture models, the HT model shows promising improvement in the SI performance. |
Tasks | Speaker Identification |
Published | 2018-08-02 |
URL | https://arxiv.org/abs/1808.00959v2 |
https://arxiv.org/pdf/1808.00959v2.pdf | |
PWC | https://paperswithcode.com/paper/histogram-transform-based-speaker |
Repo | |
Framework | |
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Title | A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress |
Authors | Saurabh Arora, Prashant Doshi |
Abstract | Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions. |
Tasks | |
Published | 2018-06-18 |
URL | https://arxiv.org/abs/1806.06877v2 |
https://arxiv.org/pdf/1806.06877v2.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-inverse-reinforcement-learning |
Repo | |
Framework | |
Weakly Supervised Training of Speaker Identification Models
Title | Weakly Supervised Training of Speaker Identification Models |
Authors | Martin Karu, Tanel Alumäe |
Abstract | We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the recording level. The method uses speaker diarization to find unique speakers in each recording, and i-vectors to project the speech of each speaker to a fixed-dimensional vector. A neural network is then trained to map i-vectors to speakers, using a special objective function that allows to optimize the model using recording-level speaker labels. We report experiments on two different real-world datasets. On the VoxCeleb dataset, the method provides 94.6% accuracy on a closed set speaker identification task, surpassing the baseline performance by a large margin. On an Estonian broadcast news dataset, the method provides 66% time-weighted speaker identification recall at 93% precision. |
Tasks | Speaker Diarization, Speaker Identification |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08621v1 |
http://arxiv.org/pdf/1806.08621v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-training-of-speaker |
Repo | |
Framework | |
Comparison of RNN Encoder-Decoder Models for Anomaly Detection
Title | Comparison of RNN Encoder-Decoder Models for Anomaly Detection |
Authors | YeongHyeon Park, Il Dong Yun |
Abstract | In this paper, we compare different types of Recurrent Neural Network (RNN) Encoder-Decoders in anomaly detection viewpoint. We focused on finding the model that can learn the same data more effectively. We compared multiple models under the same conditions, such as the number of parameters, optimizer, and learning rate. However, the difference is whether to predict the future sequence or restore the current sequence. We constructed the dataset with simple vectors and used them for the experiment. Finally, we experimentally confirmed that the model performs better when the model restores the current sequence, rather than predict the future sequence. |
Tasks | Anomaly Detection |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06576v2 |
http://arxiv.org/pdf/1807.06576v2.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-rnn-encoder-decoder-models-for |
Repo | |
Framework | |
Random Occlusion-recovery for Person Re-identification
Title | Random Occlusion-recovery for Person Re-identification |
Authors | Di Wu, Kun Zhang, Fei Cheng, Yang Zhao, Qi Liu, Chang-An Yuan, De-Shuang Huang |
Abstract | As a basic task of multi-camera surveillance system, person re-identification aims to re-identify a query pedestrian observed from non-overlapping multiple cameras or across different time with a single camera. Recently, deep learning-based person re-identification models have achieved great success in many benchmarks. However, these supervised models require a large amount of labeled image data, and the process of manual labeling spends much manpower and time. In this study, we introduce a method to automatically synthesize labeled person images and adopt them to increase the sample number per identity for person re-identification datasets. To be specific, we use block rectangles to randomly occlude pedestrian images. Then, a generative adversarial network (GAN) model is proposed to use paired occluded and original images to synthesize the de-occluded images that similar but not identical to the original image. Afterwards, we annotate the de-occluded images with the same labels of their corresponding raw images and use them to augment the number of samples per identity. Finally, we use the augmented datasets to train baseline model. The experiment results on CUHK03, Market-1501 and DukeMTMC-reID datasets show that the effectiveness of the proposed method. |
Tasks | Person Re-Identification |
Published | 2018-09-26 |
URL | http://arxiv.org/abs/1809.09970v3 |
http://arxiv.org/pdf/1809.09970v3.pdf | |
PWC | https://paperswithcode.com/paper/random-occlusion-recovery-for-person-re |
Repo | |
Framework | |
A Factorial Mixture Prior for Compositional Deep Generative Models
Title | A Factorial Mixture Prior for Compositional Deep Generative Models |
Authors | Ulrich Paquet, Sumedh K. Ghaisas, Olivier Tieleman |
Abstract | We assume that a high-dimensional datum, like an image, is a compositional expression of a set of properties, with a complicated non-linear relationship between the datum and its properties. This paper proposes a factorial mixture prior for capturing latent properties, thereby adding structured compositionality to deep generative models. The prior treats a latent vector as belonging to Cartesian product of subspaces, each of which is quantized separately with a Gaussian mixture model. Some mixture components can be set to represent properties as observed random variables whenever labeled properties are present. Through a combination of stochastic variational inference and gradient descent, a method for learning how to infer discrete properties in an unsupervised or semi-supervised way is outlined and empirically evaluated. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07480v1 |
http://arxiv.org/pdf/1812.07480v1.pdf | |
PWC | https://paperswithcode.com/paper/a-factorial-mixture-prior-for-compositional |
Repo | |
Framework | |
MLE-induced Likelihood for Markov Random Fields
Title | MLE-induced Likelihood for Markov Random Fields |
Authors | Jie Liu, Hao Zheng |
Abstract | Due to the intractable partition function, the exact likelihood function for a Markov random field (MRF), in many situations, can only be approximated. Major approximation approaches include pseudolikelihood and Laplace approximation. In this paper, we propose a novel way of approximating the likelihood function through first approximating the marginal likelihood functions of individual parameters and then reconstructing the joint likelihood function from these marginal likelihood functions. For approximating the marginal likelihood functions, we derive a particular likelihood function from a modified scenario of coin tossing which is useful for capturing how one parameter interacts with the remaining parameters in the likelihood function. For reconstructing the joint likelihood function, we use an appropriate copula to link up these marginal likelihood functions. Numerical investigation suggests the superior performance of our approach. Especially as the size of the MRF increases, both the numerical performance and the computational cost of our approach remain consistently satisfactory, whereas Laplace approximation deteriorates and pseudolikelihood becomes computationally unbearable. |
Tasks | |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.09887v1 |
http://arxiv.org/pdf/1803.09887v1.pdf | |
PWC | https://paperswithcode.com/paper/mle-induced-likelihood-for-markov-random |
Repo | |
Framework | |
Deep Learning Models Delineates Multiple Nuclear Phenotypes in H&E Stained Histology Sections
Title | Deep Learning Models Delineates Multiple Nuclear Phenotypes in H&E Stained Histology Sections |
Authors | Mina Khoshdeli, Bahram Parvin |
Abstract | Nuclear segmentation is an important step for profiling aberrant regions of histology sections. However, segmentation is a complex problem as a result of variations in nuclear geometry (e.g., size, shape), nuclear type (e.g., epithelial, fibroblast), and nuclear phenotypes (e.g., vesicular, aneuploidy). The problem is further complicated as a result of variations in sample preparation. It is shown and validated that fusion of very deep convolutional networks overcomes (i) complexities associated with multiple nuclear phenotypes, and (ii) separation of overlapping nuclei. The fusion relies on integrating of networks that learn region- and boundary-based representations. The system has been validated on a diverse set of nuclear phenotypes that correspond to the breast and brain histology sections. |
Tasks | Nuclear Segmentation |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04427v2 |
http://arxiv.org/pdf/1802.04427v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-models-delineates-multiple |
Repo | |
Framework | |
Multi-view Sentence Representation Learning
Title | Multi-view Sentence Representation Learning |
Authors | Shuai Tang, Virginia R. de Sa |
Abstract | Multi-view learning can provide self-supervision when different views are available of the same data. The distributional hypothesis provides another form of useful self-supervision from adjacent sentences which are plentiful in large unlabelled corpora. Motivated by the asymmetry in the two hemispheres of the human brain as well as the observation that different learning architectures tend to emphasise different aspects of sentence meaning, we create a unified multi-view sentence representation learning framework, in which, one view encodes the input sentence with a Recurrent Neural Network (RNN), and the other view encodes it with a simple linear model, and the training objective is to maximise the agreement specified by the adjacent context information between two views. We show that, after training, the vectors produced from our multi-view training provide improved representations over the single-view training, and the combination of different views gives further representational improvement and demonstrates solid transferability on standard downstream tasks. |
Tasks | MULTI-VIEW LEARNING, Representation Learning |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07443v1 |
http://arxiv.org/pdf/1805.07443v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-sentence-representation-learning |
Repo | |
Framework | |