October 21, 2019

2887 words 14 mins read

Paper Group AWR 72

Paper Group AWR 72

Deep Clustering for Unsupervised Learning of Visual Features. The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems. Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds. Improving Entity Linking by Modeling Latent Relations between Mentions. Neural Network Quine. MIZAN: …

Deep Clustering for Unsupervised Learning of Visual Features

Title Deep Clustering for Unsupervised Learning of Visual Features
Authors Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
Abstract Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network. We apply DeepCluster to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art by a significant margin on all the standard benchmarks.
Tasks
Published 2018-07-15
URL http://arxiv.org/abs/1807.05520v2
PDF http://arxiv.org/pdf/1807.05520v2.pdf
PWC https://paperswithcode.com/paper/deep-clustering-for-unsupervised-learning-of
Repo https://github.com/Confusezius/selfsupervised_learning
Framework pytorch

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

Title The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems
Authors Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas, Thierry Dutoit
Abstract In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous way. We show the data’s efficiency by building a simple MLP system converting neutral to angry speech style and evaluate it via a CMOS perception test. Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work.
Tasks Speech Emotion Recognition, Speech Synthesis, Text-To-Speech Synthesis
Published 2018-06-25
URL http://arxiv.org/abs/1806.09514v1
PDF http://arxiv.org/pdf/1806.09514v1.pdf
PWC https://paperswithcode.com/paper/the-emotional-voices-database-towards
Repo https://github.com/numediart/EmoV-DB
Framework none

Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds

Title Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds
Authors Septimia Sârbu, Riccardo Volpi, Alexandra Peşte, Luigi Malagò
Abstract In this paper we propose two novel bounds for the log-likelihood based on Kullback-Leibler and the R'{e}nyi divergences, which can be used for variational inference and in particular for the training of Variational AutoEncoders. Our proposal is motivated by the difficulties encountered in training VAEs on continuous datasets with high contrast images, such as those with handwritten digits and characters, where numerical issues often appear unless noise is added, either to the dataset during training or to the generative model given by the decoder. The new bounds we propose, which are obtained from the maximization of the likelihood of an interval for the observations, allow numerically stable training procedures without the necessity of adding any extra source of noise to the data.
Tasks
Published 2018-07-05
URL http://arxiv.org/abs/1807.01889v1
PDF http://arxiv.org/pdf/1807.01889v1.pdf
PWC https://paperswithcode.com/paper/learning-in-variational-autoencoders-with
Repo https://github.com/SeptimiaSarbu/Integral-Renyi-ELBO
Framework tf

Improving Entity Linking by Modeling Latent Relations between Mentions

Title Improving Entity Linking by Modeling Latent Relations between Mentions
Authors Phong Le, Ivan Titov
Abstract Entity linking involves aligning textual mentions of named entities to their corresponding entries in a knowledge base. Entity linking systems often exploit relations between textual mentions in a document (e.g., coreference) to decide if the linking decisions are compatible. Unlike previous approaches, which relied on supervised systems or heuristics to predict these relations, we treat relations as latent variables in our neural entity-linking model. We induce the relations without any supervision while optimizing the entity-linking system in an end-to-end fashion. Our multi-relational model achieves the best reported scores on the standard benchmark (AIDA-CoNLL) and substantially outperforms its relation-agnostic version. Its training also converges much faster, suggesting that the injected structural bias helps to explain regularities in the training data.
Tasks Entity Linking
Published 2018-04-27
URL http://arxiv.org/abs/1804.10637v1
PDF http://arxiv.org/pdf/1804.10637v1.pdf
PWC https://paperswithcode.com/paper/improving-entity-linking-by-modeling-latent
Repo https://github.com/lephong/mulrel-nel
Framework pytorch

Neural Network Quine

Title Neural Network Quine
Authors Oscar Chang, Hod Lipson
Abstract Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network’s ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection.
Tasks Image Classification
Published 2018-03-15
URL http://arxiv.org/abs/1803.05859v4
PDF http://arxiv.org/pdf/1803.05859v4.pdf
PWC https://paperswithcode.com/paper/neural-network-quine
Repo https://github.com/AustinT/nn-quine
Framework pytorch

MIZAN: A Large Persian-English Parallel Corpus

Title MIZAN: A Large Persian-English Parallel Corpus
Authors Omid Kashefi
Abstract One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and experiment a base-line statistical machine translation system using the corpus.
Tasks Machine Translation
Published 2018-01-07
URL https://arxiv.org/abs/1801.02107v3
PDF https://arxiv.org/pdf/1801.02107v3.pdf
PWC https://paperswithcode.com/paper/mizan-a-large-persian-english-parallel-corpus
Repo https://github.com/omidkashefi/Mizan
Framework none

Neural Relational Inference for Interacting Systems

Title Neural Relational Inference for Interacting Systems
Authors Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, Richard Zemel
Abstract Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system’s constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.
Tasks Motion Capture
Published 2018-02-13
URL http://arxiv.org/abs/1802.04687v2
PDF http://arxiv.org/pdf/1802.04687v2.pdf
PWC https://paperswithcode.com/paper/neural-relational-inference-for-interacting
Repo https://github.com/ELEMKEP/bsc_lcs
Framework pytorch

WordNet-feelings: A linguistic categorisation of human feelings

Title WordNet-feelings: A linguistic categorisation of human feelings
Authors Advaith Siddharthan, Nicolas Cherbuin, Paul J. Eslinger, Kasia Kozlowska, Nora A. Murphy, Leroy Lowe
Abstract In this article, we present the first in depth linguistic study of human feelings. While there has been substantial research on incorporating some affective categories into linguistic analysis (e.g. sentiment, and to a lesser extent, emotion), the more diverse category of human feelings has thus far not been investigated. We surveyed the extensive interdisciplinary literature around feelings to construct a working definition of what constitutes a feeling and propose 9 broad categories of feeling. We identified potential feeling words based on their pointwise mutual information with morphological variants of the word feel' in the Google n-gram corpus, and present a manual annotation exercise where 317 WordNet senses of one hundred of these words were categorised as not a feeling’ or as one of the 9 proposed categories of feeling. We then proceeded to annotate 11386 WordNet senses of all these words to create WordNet-feelings, a new affective dataset that identifies 3664 word senses as feelings, and associates each of these with one of the 9 categories of feeling. WordNet-feelings can be used in conjunction with other datasets such as SentiWordNet that annotate word senses with complementary affective properties such as valence and intensity.
Tasks
Published 2018-11-06
URL http://arxiv.org/abs/1811.02435v1
PDF http://arxiv.org/pdf/1811.02435v1.pdf
PWC https://paperswithcode.com/paper/wordnet-feelings-a-linguistic-categorisation
Repo https://github.com/as36438/WordNet-feelings
Framework none

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

Title Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Authors Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen
Abstract Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances. Most existing methods compromise with speed or accuracy to handle the large RIP variations. To address this problem more efficiently, we propose Progressive Calibration Networks (PCN) to perform rotation-invariant face detection in a coarse-to-fine manner. PCN consists of three stages, each of which not only distinguishes the faces from non-faces, but also calibrates the RIP orientation of each face candidate to upright progressively. By dividing the calibration process into several progressive steps and only predicting coarse orientations in early stages, PCN can achieve precise and fast calibration. By performing binary classification of face vs. non-face with gradually decreasing RIP ranges, PCN can accurately detect faces with full $360^{\circ}$ RIP angles. Such designs lead to a real-time rotation-invariant face detector. The experiments on multi-oriented FDDB and a challenging subset of WIDER FACE containing rotated faces in the wild show that our PCN achieves quite promising performance.
Tasks Calibration, Face Detection
Published 2018-04-17
URL http://arxiv.org/abs/1804.06039v1
PDF http://arxiv.org/pdf/1804.06039v1.pdf
PWC https://paperswithcode.com/paper/real-time-rotation-invariant-face-detection
Repo https://github.com/daixiangzi/Caffe-PCN
Framework caffe2

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

Title ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
Authors Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel
Abstract Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on “Stylized-ImageNet”, a stylized version of ImageNet. This provides a much better fit for human behavioural performance in our well-controlled psychophysical lab setting (nine experiments totalling 48,560 psychophysical trials across 97 observers) and comes with a number of unexpected emergent benefits such as improved object detection performance and previously unseen robustness towards a wide range of image distortions, highlighting advantages of a shape-based representation.
Tasks Image Classification, Object Detection
Published 2018-11-29
URL http://arxiv.org/abs/1811.12231v2
PDF http://arxiv.org/pdf/1811.12231v2.pdf
PWC https://paperswithcode.com/paper/imagenet-trained-cnns-are-biased-towards
Repo https://github.com/rgeirhos/Stylized-ImageNet
Framework pytorch

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

Title UNet++: A Nested U-Net Architecture for Medical Image Segmentation
Authors Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang
Abstract Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet
Tasks Medical Image Segmentation, Semantic Segmentation
Published 2018-07-18
URL http://arxiv.org/abs/1807.10165v1
PDF http://arxiv.org/pdf/1807.10165v1.pdf
PWC https://paperswithcode.com/paper/unet-a-nested-u-net-architecture-for-medical
Repo https://github.com/1044197988/TF.Keras-Commonly-used-models
Framework tf

Lesion Focused Super-Resolution

Title Lesion Focused Super-Resolution
Authors Jin Zhu, Guang Yang, Pietro Lio
Abstract Super-resolution (SR) for image enhancement has great importance in medical image applications. Broadly speaking, there are two types of SR, one requires multiple low resolution (LR) images from different views of the same object to be reconstructed to the high resolution (HR) output, and the other one relies on the learning from a large amount of training datasets, i.e., LR-HR pairs. In real clinical environment, acquiring images from multi-views is expensive and sometimes infeasible. In this paper, we present a novel Generative Adversarial Networks (GAN) based learning framework to achieve SR from its LR version. By performing simulation based studies on the Multimodal Brain Tumor Segmentation Challenge (BraTS) datasets, we demonstrate the efficacy of our method in application of brain tumor MRI enhancement. Compared to bilinear interpolation and other state-of-the-art SR methods, our model is lesion focused, which is not only resulted in better perceptual image quality without blurring, but also more efficient and directly benefit for the following clinical tasks, e.g., lesion detection and abnormality enhancement. Therefore, we can envisage the application of our SR method to boost image spatial resolution while maintaining crucial diagnostic information for further clinical tasks.
Tasks Brain Tumor Segmentation, Image Enhancement, Super-Resolution
Published 2018-10-15
URL http://arxiv.org/abs/1810.06693v1
PDF http://arxiv.org/pdf/1810.06693v1.pdf
PWC https://paperswithcode.com/paper/lesion-focused-super-resolution
Repo https://github.com/GinZhu/MSGAN
Framework none

Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation

Title Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation
Authors Bin Wang, Zhijian Ou
Abstract A new whole-sentence language model - neural trans-dimensional random field language model (neural TRF LM), where sentences are modeled as a collection of random fields, and the potential function is defined by a neural network, has been introduced and successfully trained by noise-contrastive estimation (NCE). In this paper, we extend NCE and propose dynamic noise-contrastive estimation (DNCE) to solve the two problems observed in NCE training. First, a dynamic noise distribution is introduced and trained simultaneously to converge to the data distribution. This helps to significantly cut down the noise sample number used in NCE and reduce the training cost. Second, DNCE discriminates between sentences generated from the noise distribution and sentences generated from the interpolation of the data distribution and the noise distribution. This alleviates the overfitting problem caused by the sparseness of the training set. With DNCE, we can successfully and efficiently train neural TRF LMs on large corpus (about 0.8 billion words) with large vocabulary (about 568 K words). Neural TRF LMs perform as good as LSTM LMs with less parameters and being 5x~114x faster in rescoring sentences. Interpolating neural TRF LMs with LSTM LMs and n-gram LMs can further reduce the error rates.
Tasks Language Modelling
Published 2018-07-03
URL http://arxiv.org/abs/1807.00993v1
PDF http://arxiv.org/pdf/1807.00993v1.pdf
PWC https://paperswithcode.com/paper/improved-training-of-neural-trans-dimensional
Repo https://github.com/wbengine/TRF-NN-Tensorflow
Framework tf
Title Deep clustering: On the link between discriminative models and K-means
Authors Mohammed Jabi, Marco Pedersoli, Amar Mitiche, Ismail Ben Ayed
Abstract In the context of recent deep clustering studies, discriminative models dominate the literature and report the most competitive performances. These models learn a deep discriminative neural network classifier in which the labels are latent. Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning. It is generally acknowledged that discriminative objective functions (e.g., those based on the mutual information or the KL divergence) are more flexible than generative approaches (e.g., K-means) in the sense that they make fewer assumptions about the data distributions and, typically, yield much better unsupervised deep learning results. On the surface, several recent discriminative models may seem unrelated to K-means. This study shows that these models are, in fact, equivalent to K-means under mild conditions and common posterior models and parameter regularization. We prove that, for the commonly used logistic regression posteriors, maximizing the $L_2$ regularized mutual information via an approximate alternating direction method (ADM) is equivalent to a soft and regularized K-means loss. Our theoretical analysis not only connects directly several recent state-of-the-art discriminative models to K-means, but also leads to a new soft and regularized deep K-means algorithm, which yields competitive performance on several image clustering benchmarks.
Tasks Image Clustering
Published 2018-10-09
URL https://arxiv.org/abs/1810.04246v2
PDF https://arxiv.org/pdf/1810.04246v2.pdf
PWC https://paperswithcode.com/paper/deep-clustering-on-the-link-between
Repo https://github.com/MOhammedJAbi/SoftKMeans
Framework pytorch

Data-Efficient Design Exploration through Surrogate-Assisted Illumination

Title Data-Efficient Design Exploration through Surrogate-Assisted Illumination
Authors Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret
Abstract Design optimization techniques are often used at the beginning of the design process to explore the space of possible designs. In these domains illumination algorithms, such as MAP-Elites, are promising alternatives to classic optimization algorithms because they produce diverse, high-quality solutions in a single run, instead of only a single near-optimal solution. Unfortunately, these algorithms currently require a large number of function evaluations, limiting their applicability. In this article we introduce a new illumination algorithm, Surrogate-Assisted Illumination (SAIL), that leverages surrogate modeling techniques to create a map of the design space according to user-defined features while minimizing the number of fitness evaluations. On a 2-dimensional airfoil optimization problem SAIL produces hundreds of diverse but high-performing designs with several orders of magnitude fewer evaluations than MAP-Elites or CMA-ES. We demonstrate that SAIL is also capable of producing maps of high-performing designs in realistic 3-dimensional aerodynamic tasks with an accurate flow simulation. Data-efficient design exploration with SAIL can help designers understand what is possible, beyond what is optimal, by considering more than pure objective-based optimization.
Tasks
Published 2018-06-15
URL http://arxiv.org/abs/1806.05865v1
PDF http://arxiv.org/pdf/1806.05865v1.pdf
PWC https://paperswithcode.com/paper/data-efficient-design-exploration-through
Repo https://github.com/DanieleGravina/divergence-and-quality-diversity
Framework none
comments powered by Disqus