Paper Group AWR 72
Deep Clustering for Unsupervised Learning of Visual Features. The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems. Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds. Improving Entity Linking by Modeling Latent Relations between Mentions. Neural Network Quine. MIZAN: …
Deep Clustering for Unsupervised Learning of Visual Features
Title | Deep Clustering for Unsupervised Learning of Visual Features |
Authors | Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze |
Abstract | Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network. We apply DeepCluster to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art by a significant margin on all the standard benchmarks. |
Tasks | |
Published | 2018-07-15 |
URL | http://arxiv.org/abs/1807.05520v2 |
http://arxiv.org/pdf/1807.05520v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-clustering-for-unsupervised-learning-of |
Repo | https://github.com/Confusezius/selfsupervised_learning |
Framework | pytorch |
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems
Title | The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems |
Authors | Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas, Thierry Dutoit |
Abstract | In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous way. We show the data’s efficiency by building a simple MLP system converting neutral to angry speech style and evaluate it via a CMOS perception test. Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work. |
Tasks | Speech Emotion Recognition, Speech Synthesis, Text-To-Speech Synthesis |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09514v1 |
http://arxiv.org/pdf/1806.09514v1.pdf | |
PWC | https://paperswithcode.com/paper/the-emotional-voices-database-towards |
Repo | https://github.com/numediart/EmoV-DB |
Framework | none |
Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds
Title | Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds |
Authors | Septimia Sârbu, Riccardo Volpi, Alexandra Peşte, Luigi Malagò |
Abstract | In this paper we propose two novel bounds for the log-likelihood based on Kullback-Leibler and the R'{e}nyi divergences, which can be used for variational inference and in particular for the training of Variational AutoEncoders. Our proposal is motivated by the difficulties encountered in training VAEs on continuous datasets with high contrast images, such as those with handwritten digits and characters, where numerical issues often appear unless noise is added, either to the dataset during training or to the generative model given by the decoder. The new bounds we propose, which are obtained from the maximization of the likelihood of an interval for the observations, allow numerically stable training procedures without the necessity of adding any extra source of noise to the data. |
Tasks | |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.01889v1 |
http://arxiv.org/pdf/1807.01889v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-in-variational-autoencoders-with |
Repo | https://github.com/SeptimiaSarbu/Integral-Renyi-ELBO |
Framework | tf |
Improving Entity Linking by Modeling Latent Relations between Mentions
Title | Improving Entity Linking by Modeling Latent Relations between Mentions |
Authors | Phong Le, Ivan Titov |
Abstract | Entity linking involves aligning textual mentions of named entities to their corresponding entries in a knowledge base. Entity linking systems often exploit relations between textual mentions in a document (e.g., coreference) to decide if the linking decisions are compatible. Unlike previous approaches, which relied on supervised systems or heuristics to predict these relations, we treat relations as latent variables in our neural entity-linking model. We induce the relations without any supervision while optimizing the entity-linking system in an end-to-end fashion. Our multi-relational model achieves the best reported scores on the standard benchmark (AIDA-CoNLL) and substantially outperforms its relation-agnostic version. Its training also converges much faster, suggesting that the injected structural bias helps to explain regularities in the training data. |
Tasks | Entity Linking |
Published | 2018-04-27 |
URL | http://arxiv.org/abs/1804.10637v1 |
http://arxiv.org/pdf/1804.10637v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-entity-linking-by-modeling-latent |
Repo | https://github.com/lephong/mulrel-nel |
Framework | pytorch |
Neural Network Quine
Title | Neural Network Quine |
Authors | Oscar Chang, Hod Lipson |
Abstract | Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network’s ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection. |
Tasks | Image Classification |
Published | 2018-03-15 |
URL | http://arxiv.org/abs/1803.05859v4 |
http://arxiv.org/pdf/1803.05859v4.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-quine |
Repo | https://github.com/AustinT/nn-quine |
Framework | pytorch |
MIZAN: A Large Persian-English Parallel Corpus
Title | MIZAN: A Large Persian-English Parallel Corpus |
Authors | Omid Kashefi |
Abstract | One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and experiment a base-line statistical machine translation system using the corpus. |
Tasks | Machine Translation |
Published | 2018-01-07 |
URL | https://arxiv.org/abs/1801.02107v3 |
https://arxiv.org/pdf/1801.02107v3.pdf | |
PWC | https://paperswithcode.com/paper/mizan-a-large-persian-english-parallel-corpus |
Repo | https://github.com/omidkashefi/Mizan |
Framework | none |
Neural Relational Inference for Interacting Systems
Title | Neural Relational Inference for Interacting Systems |
Authors | Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, Richard Zemel |
Abstract | Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system’s constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data. |
Tasks | Motion Capture |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04687v2 |
http://arxiv.org/pdf/1802.04687v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-relational-inference-for-interacting |
Repo | https://github.com/ELEMKEP/bsc_lcs |
Framework | pytorch |
WordNet-feelings: A linguistic categorisation of human feelings
Title | WordNet-feelings: A linguistic categorisation of human feelings |
Authors | Advaith Siddharthan, Nicolas Cherbuin, Paul J. Eslinger, Kasia Kozlowska, Nora A. Murphy, Leroy Lowe |
Abstract | In this article, we present the first in depth linguistic study of human feelings. While there has been substantial research on incorporating some affective categories into linguistic analysis (e.g. sentiment, and to a lesser extent, emotion), the more diverse category of human feelings has thus far not been investigated. We surveyed the extensive interdisciplinary literature around feelings to construct a working definition of what constitutes a feeling and propose 9 broad categories of feeling. We identified potential feeling words based on their pointwise mutual information with morphological variants of the word feel' in the Google n-gram corpus, and present a manual annotation exercise where 317 WordNet senses of one hundred of these words were categorised as not a feeling’ or as one of the 9 proposed categories of feeling. We then proceeded to annotate 11386 WordNet senses of all these words to create WordNet-feelings, a new affective dataset that identifies 3664 word senses as feelings, and associates each of these with one of the 9 categories of feeling. WordNet-feelings can be used in conjunction with other datasets such as SentiWordNet that annotate word senses with complementary affective properties such as valence and intensity. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02435v1 |
http://arxiv.org/pdf/1811.02435v1.pdf | |
PWC | https://paperswithcode.com/paper/wordnet-feelings-a-linguistic-categorisation |
Repo | https://github.com/as36438/WordNet-feelings |
Framework | none |
Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Title | Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks |
Authors | Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen |
Abstract | Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances. Most existing methods compromise with speed or accuracy to handle the large RIP variations. To address this problem more efficiently, we propose Progressive Calibration Networks (PCN) to perform rotation-invariant face detection in a coarse-to-fine manner. PCN consists of three stages, each of which not only distinguishes the faces from non-faces, but also calibrates the RIP orientation of each face candidate to upright progressively. By dividing the calibration process into several progressive steps and only predicting coarse orientations in early stages, PCN can achieve precise and fast calibration. By performing binary classification of face vs. non-face with gradually decreasing RIP ranges, PCN can accurately detect faces with full $360^{\circ}$ RIP angles. Such designs lead to a real-time rotation-invariant face detector. The experiments on multi-oriented FDDB and a challenging subset of WIDER FACE containing rotated faces in the wild show that our PCN achieves quite promising performance. |
Tasks | Calibration, Face Detection |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06039v1 |
http://arxiv.org/pdf/1804.06039v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-rotation-invariant-face-detection |
Repo | https://github.com/daixiangzi/Caffe-PCN |
Framework | caffe2 |
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
Title | ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness |
Authors | Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel |
Abstract | Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on “Stylized-ImageNet”, a stylized version of ImageNet. This provides a much better fit for human behavioural performance in our well-controlled psychophysical lab setting (nine experiments totalling 48,560 psychophysical trials across 97 observers) and comes with a number of unexpected emergent benefits such as improved object detection performance and previously unseen robustness towards a wide range of image distortions, highlighting advantages of a shape-based representation. |
Tasks | Image Classification, Object Detection |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12231v2 |
http://arxiv.org/pdf/1811.12231v2.pdf | |
PWC | https://paperswithcode.com/paper/imagenet-trained-cnns-are-biased-towards |
Repo | https://github.com/rgeirhos/Stylized-ImageNet |
Framework | pytorch |
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
Title | UNet++: A Nested U-Net Architecture for Medical Image Segmentation |
Authors | Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang |
Abstract | Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.10165v1 |
http://arxiv.org/pdf/1807.10165v1.pdf | |
PWC | https://paperswithcode.com/paper/unet-a-nested-u-net-architecture-for-medical |
Repo | https://github.com/1044197988/TF.Keras-Commonly-used-models |
Framework | tf |
Lesion Focused Super-Resolution
Title | Lesion Focused Super-Resolution |
Authors | Jin Zhu, Guang Yang, Pietro Lio |
Abstract | Super-resolution (SR) for image enhancement has great importance in medical image applications. Broadly speaking, there are two types of SR, one requires multiple low resolution (LR) images from different views of the same object to be reconstructed to the high resolution (HR) output, and the other one relies on the learning from a large amount of training datasets, i.e., LR-HR pairs. In real clinical environment, acquiring images from multi-views is expensive and sometimes infeasible. In this paper, we present a novel Generative Adversarial Networks (GAN) based learning framework to achieve SR from its LR version. By performing simulation based studies on the Multimodal Brain Tumor Segmentation Challenge (BraTS) datasets, we demonstrate the efficacy of our method in application of brain tumor MRI enhancement. Compared to bilinear interpolation and other state-of-the-art SR methods, our model is lesion focused, which is not only resulted in better perceptual image quality without blurring, but also more efficient and directly benefit for the following clinical tasks, e.g., lesion detection and abnormality enhancement. Therefore, we can envisage the application of our SR method to boost image spatial resolution while maintaining crucial diagnostic information for further clinical tasks. |
Tasks | Brain Tumor Segmentation, Image Enhancement, Super-Resolution |
Published | 2018-10-15 |
URL | http://arxiv.org/abs/1810.06693v1 |
http://arxiv.org/pdf/1810.06693v1.pdf | |
PWC | https://paperswithcode.com/paper/lesion-focused-super-resolution |
Repo | https://github.com/GinZhu/MSGAN |
Framework | none |
Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation
Title | Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation |
Authors | Bin Wang, Zhijian Ou |
Abstract | A new whole-sentence language model - neural trans-dimensional random field language model (neural TRF LM), where sentences are modeled as a collection of random fields, and the potential function is defined by a neural network, has been introduced and successfully trained by noise-contrastive estimation (NCE). In this paper, we extend NCE and propose dynamic noise-contrastive estimation (DNCE) to solve the two problems observed in NCE training. First, a dynamic noise distribution is introduced and trained simultaneously to converge to the data distribution. This helps to significantly cut down the noise sample number used in NCE and reduce the training cost. Second, DNCE discriminates between sentences generated from the noise distribution and sentences generated from the interpolation of the data distribution and the noise distribution. This alleviates the overfitting problem caused by the sparseness of the training set. With DNCE, we can successfully and efficiently train neural TRF LMs on large corpus (about 0.8 billion words) with large vocabulary (about 568 K words). Neural TRF LMs perform as good as LSTM LMs with less parameters and being 5x~114x faster in rescoring sentences. Interpolating neural TRF LMs with LSTM LMs and n-gram LMs can further reduce the error rates. |
Tasks | Language Modelling |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.00993v1 |
http://arxiv.org/pdf/1807.00993v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-training-of-neural-trans-dimensional |
Repo | https://github.com/wbengine/TRF-NN-Tensorflow |
Framework | tf |
Deep clustering: On the link between discriminative models and K-means
Title | Deep clustering: On the link between discriminative models and K-means |
Authors | Mohammed Jabi, Marco Pedersoli, Amar Mitiche, Ismail Ben Ayed |
Abstract | In the context of recent deep clustering studies, discriminative models dominate the literature and report the most competitive performances. These models learn a deep discriminative neural network classifier in which the labels are latent. Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning. It is generally acknowledged that discriminative objective functions (e.g., those based on the mutual information or the KL divergence) are more flexible than generative approaches (e.g., K-means) in the sense that they make fewer assumptions about the data distributions and, typically, yield much better unsupervised deep learning results. On the surface, several recent discriminative models may seem unrelated to K-means. This study shows that these models are, in fact, equivalent to K-means under mild conditions and common posterior models and parameter regularization. We prove that, for the commonly used logistic regression posteriors, maximizing the $L_2$ regularized mutual information via an approximate alternating direction method (ADM) is equivalent to a soft and regularized K-means loss. Our theoretical analysis not only connects directly several recent state-of-the-art discriminative models to K-means, but also leads to a new soft and regularized deep K-means algorithm, which yields competitive performance on several image clustering benchmarks. |
Tasks | Image Clustering |
Published | 2018-10-09 |
URL | https://arxiv.org/abs/1810.04246v2 |
https://arxiv.org/pdf/1810.04246v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-clustering-on-the-link-between |
Repo | https://github.com/MOhammedJAbi/SoftKMeans |
Framework | pytorch |
Data-Efficient Design Exploration through Surrogate-Assisted Illumination
Title | Data-Efficient Design Exploration through Surrogate-Assisted Illumination |
Authors | Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret |
Abstract | Design optimization techniques are often used at the beginning of the design process to explore the space of possible designs. In these domains illumination algorithms, such as MAP-Elites, are promising alternatives to classic optimization algorithms because they produce diverse, high-quality solutions in a single run, instead of only a single near-optimal solution. Unfortunately, these algorithms currently require a large number of function evaluations, limiting their applicability. In this article we introduce a new illumination algorithm, Surrogate-Assisted Illumination (SAIL), that leverages surrogate modeling techniques to create a map of the design space according to user-defined features while minimizing the number of fitness evaluations. On a 2-dimensional airfoil optimization problem SAIL produces hundreds of diverse but high-performing designs with several orders of magnitude fewer evaluations than MAP-Elites or CMA-ES. We demonstrate that SAIL is also capable of producing maps of high-performing designs in realistic 3-dimensional aerodynamic tasks with an accurate flow simulation. Data-efficient design exploration with SAIL can help designers understand what is possible, beyond what is optimal, by considering more than pure objective-based optimization. |
Tasks | |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.05865v1 |
http://arxiv.org/pdf/1806.05865v1.pdf | |
PWC | https://paperswithcode.com/paper/data-efficient-design-exploration-through |
Repo | https://github.com/DanieleGravina/divergence-and-quality-diversity |
Framework | none |