October 21, 2019

2887 words 14 mins read

Paper Group AWR 72

Deep Clustering for Unsupervised Learning of Visual Features. The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems. Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds. Improving Entity Linking by Modeling Latent Relations between Mentions. Neural Network Quine. MIZAN: …

Deep Clustering for Unsupervised Learning of Visual Features


Title	Deep Clustering for Unsupervised Learning of Visual Features
Authors	Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze
Abstract	Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network. We apply DeepCluster to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art by a significant margin on all the standard benchmarks.
Tasks
Published	2018-07-15
URL	http://arxiv.org/abs/1807.05520v2
PDF	http://arxiv.org/pdf/1807.05520v2.pdf
PWC	https://paperswithcode.com/paper/deep-clustering-for-unsupervised-learning-of
Repo	https://github.com/Confusezius/selfsupervised_learning
Framework	pytorch

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems


Title	The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems
Authors	Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas, Thierry Dutoit
Abstract	In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous way. We show the data’s efficiency by building a simple MLP system converting neutral to angry speech style and evaluate it via a CMOS perception test. Even though the system is a very simple one, the test show the efficiency of the data which is promising for future work.
Tasks	Speech Emotion Recognition, Speech Synthesis, Text-To-Speech Synthesis
Published	2018-06-25
URL	http://arxiv.org/abs/1806.09514v1
PDF	http://arxiv.org/pdf/1806.09514v1.pdf
PWC	https://paperswithcode.com/paper/the-emotional-voices-database-towards
Repo	https://github.com/numediart/EmoV-DB
Framework	none

Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds


Title	Learning in Variational Autoencoders with Kullback-Leibler and Renyi Integral Bounds
Authors	Septimia Sârbu, Riccardo Volpi, Alexandra Peşte, Luigi Malagò
Abstract	In this paper we propose two novel bounds for the log-likelihood based on Kullback-Leibler and the R'{e}nyi divergences, which can be used for variational inference and in particular for the training of Variational AutoEncoders. Our proposal is motivated by the difficulties encountered in training VAEs on continuous datasets with high contrast images, such as those with handwritten digits and characters, where numerical issues often appear unless noise is added, either to the dataset during training or to the generative model given by the decoder. The new bounds we propose, which are obtained from the maximization of the likelihood of an interval for the observations, allow numerically stable training procedures without the necessity of adding any extra source of noise to the data.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01889v1
PDF	http://arxiv.org/pdf/1807.01889v1.pdf
PWC	https://paperswithcode.com/paper/learning-in-variational-autoencoders-with
Repo	https://github.com/SeptimiaSarbu/Integral-Renyi-ELBO
Framework	tf

Improving Entity Linking by Modeling Latent Relations between Mentions


Title	Improving Entity Linking by Modeling Latent Relations between Mentions
Authors	Phong Le, Ivan Titov
Abstract	Entity linking involves aligning textual mentions of named entities to their corresponding entries in a knowledge base. Entity linking systems often exploit relations between textual mentions in a document (e.g., coreference) to decide if the linking decisions are compatible. Unlike previous approaches, which relied on supervised systems or heuristics to predict these relations, we treat relations as latent variables in our neural entity-linking model. We induce the relations without any supervision while optimizing the entity-linking system in an end-to-end fashion. Our multi-relational model achieves the best reported scores on the standard benchmark (AIDA-CoNLL) and substantially outperforms its relation-agnostic version. Its training also converges much faster, suggesting that the injected structural bias helps to explain regularities in the training data.
Tasks	Entity Linking
Published	2018-04-27
URL	http://arxiv.org/abs/1804.10637v1
PDF	http://arxiv.org/pdf/1804.10637v1.pdf
PWC	https://paperswithcode.com/paper/improving-entity-linking-by-modeling-latent
Repo	https://github.com/lephong/mulrel-nel
Framework	pytorch

Neural Network Quine


Title	Neural Network Quine
Authors	Oscar Chang, Hod Lipson
Abstract	Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network’s ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection.
Tasks	Image Classification
Published	2018-03-15
URL	http://arxiv.org/abs/1803.05859v4
PDF	http://arxiv.org/pdf/1803.05859v4.pdf
PWC	https://paperswithcode.com/paper/neural-network-quine
Repo	https://github.com/AustinT/nn-quine
Framework	pytorch

MIZAN: A Large Persian-English Parallel Corpus


Title	MIZAN: A Large Persian-English Parallel Corpus
Authors	Omid Kashefi
Abstract	One of the most major and essential tasks in natural language processing is machine translation that is now highly dependent upon multilingual parallel corpora. Through this paper, we introduce the biggest Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature. We also present acquisition process and statistics of the corpus, and experiment a base-line statistical machine translation system using the corpus.
Tasks	Machine Translation
Published	2018-01-07
URL	https://arxiv.org/abs/1801.02107v3
PDF	https://arxiv.org/pdf/1801.02107v3.pdf
PWC	https://paperswithcode.com/paper/mizan-a-large-persian-english-parallel-corpus
Repo	https://github.com/omidkashefi/Mizan
Framework	none

Neural Relational Inference for Interacting Systems


Title	Neural Relational Inference for Interacting Systems
Authors	Thomas Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, Richard Zemel
Abstract	Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system’s constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.
Tasks	Motion Capture
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04687v2
PDF	http://arxiv.org/pdf/1802.04687v2.pdf
PWC	https://paperswithcode.com/paper/neural-relational-inference-for-interacting
Repo	https://github.com/ELEMKEP/bsc_lcs
Framework	pytorch

WordNet-feelings: A linguistic categorisation of human feelings


Title	WordNet-feelings: A linguistic categorisation of human feelings
Authors	Advaith Siddharthan, Nicolas Cherbuin, Paul J. Eslinger, Kasia Kozlowska, Nora A. Murphy, Leroy Lowe
Abstract	In this article, we present the first in depth linguistic study of human feelings. While there has been substantial research on incorporating some affective categories into linguistic analysis (e.g. sentiment, and to a lesser extent, emotion), the more diverse category of human feelings has thus far not been investigated. We surveyed the extensive interdisciplinary literature around feelings to construct a working definition of what constitutes a feeling and propose 9 broad categories of feeling. We identified potential feeling words based on their pointwise mutual information with morphological variants of the word `feel' in the Google n-gram corpus, and present a manual annotation exercise where 317 WordNet senses of one hundred of these words were categorised as` not a feeling’ or as one of the 9 proposed categories of feeling. We then proceeded to annotate 11386 WordNet senses of all these words to create WordNet-feelings, a new affective dataset that identifies 3664 word senses as feelings, and associates each of these with one of the 9 categories of feeling. WordNet-feelings can be used in conjunction with other datasets such as SentiWordNet that annotate word senses with complementary affective properties such as valence and intensity.
Tasks
Published	2018-11-06
URL	http://arxiv.org/abs/1811.02435v1
PDF	http://arxiv.org/pdf/1811.02435v1.pdf
PWC	https://paperswithcode.com/paper/wordnet-feelings-a-linguistic-categorisation
Repo	https://github.com/as36438/WordNet-feelings
Framework	none

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks


Title	Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Authors	Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen
Abstract	Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances. Most existing methods compromise with speed or accuracy to handle the large RIP variations. To address this problem more efficiently, we propose Progressive Calibration Networks (PCN) to perform rotation-invariant face detection in a coarse-to-fine manner. PCN consists of three stages, each of which not only distinguishes the faces from non-faces, but also calibrates the RIP orientation of each face candidate to upright progressively. By dividing the calibration process into several progressive steps and only predicting coarse orientations in early stages, PCN can achieve precise and fast calibration. By performing binary classification of face vs. non-face with gradually decreasing RIP ranges, PCN can accurately detect faces with full $360^{\circ}$ RIP angles. Such designs lead to a real-time rotation-invariant face detector. The experiments on multi-oriented FDDB and a challenging subset of WIDER FACE containing rotated faces in the wild show that our PCN achieves quite promising performance.
Tasks	Calibration, Face Detection
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06039v1
PDF	http://arxiv.org/pdf/1804.06039v1.pdf
PWC	https://paperswithcode.com/paper/real-time-rotation-invariant-face-detection
Repo	https://github.com/daixiangzi/Caffe-PCN
Framework	caffe2

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness


Title	ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness
Authors	Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, Wieland Brendel
Abstract	Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on “Stylized-ImageNet”, a stylized version of ImageNet. This provides a much better fit for human behavioural performance in our well-controlled psychophysical lab setting (nine experiments totalling 48,560 psychophysical trials across 97 observers) and comes with a number of unexpected emergent benefits such as improved object detection performance and previously unseen robustness towards a wide range of image distortions, highlighting advantages of a shape-based representation.
Tasks	Image Classification, Object Detection
Published	2018-11-29
URL	http://arxiv.org/abs/1811.12231v2
PDF	http://arxiv.org/pdf/1811.12231v2.pdf
PWC	https://paperswithcode.com/paper/imagenet-trained-cnns-are-biased-towards
Repo	https://github.com/rgeirhos/Stylized-ImageNet
Framework	pytorch

UNet++: A Nested U-Net Architecture for Medical Image Segmentation


Title	UNet++: A Nested U-Net Architecture for Medical Image Segmentation
Authors	Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang
Abstract	Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2018-07-18
URL	http://arxiv.org/abs/1807.10165v1
PDF	http://arxiv.org/pdf/1807.10165v1.pdf
PWC	https://paperswithcode.com/paper/unet-a-nested-u-net-architecture-for-medical
Repo	https://github.com/1044197988/TF.Keras-Commonly-used-models
Framework	tf

Lesion Focused Super-Resolution


Title	Lesion Focused Super-Resolution
Authors	Jin Zhu, Guang Yang, Pietro Lio
Abstract	Super-resolution (SR) for image enhancement has great importance in medical image applications. Broadly speaking, there are two types of SR, one requires multiple low resolution (LR) images from different views of the same object to be reconstructed to the high resolution (HR) output, and the other one relies on the learning from a large amount of training datasets, i.e., LR-HR pairs. In real clinical environment, acquiring images from multi-views is expensive and sometimes infeasible. In this paper, we present a novel Generative Adversarial Networks (GAN) based learning framework to achieve SR from its LR version. By performing simulation based studies on the Multimodal Brain Tumor Segmentation Challenge (BraTS) datasets, we demonstrate the efficacy of our method in application of brain tumor MRI enhancement. Compared to bilinear interpolation and other state-of-the-art SR methods, our model is lesion focused, which is not only resulted in better perceptual image quality without blurring, but also more efficient and directly benefit for the following clinical tasks, e.g., lesion detection and abnormality enhancement. Therefore, we can envisage the application of our SR method to boost image spatial resolution while maintaining crucial diagnostic information for further clinical tasks.
Tasks	Brain Tumor Segmentation, Image Enhancement, Super-Resolution
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06693v1
PDF	http://arxiv.org/pdf/1810.06693v1.pdf
PWC	https://paperswithcode.com/paper/lesion-focused-super-resolution
Repo	https://github.com/GinZhu/MSGAN
Framework	none

Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation


Title	Improved training of neural trans-dimensional random field language models with dynamic noise-contrastive estimation
Authors	Bin Wang, Zhijian Ou
Abstract	A new whole-sentence language model - neural trans-dimensional random field language model (neural TRF LM), where sentences are modeled as a collection of random fields, and the potential function is defined by a neural network, has been introduced and successfully trained by noise-contrastive estimation (NCE). In this paper, we extend NCE and propose dynamic noise-contrastive estimation (DNCE) to solve the two problems observed in NCE training. First, a dynamic noise distribution is introduced and trained simultaneously to converge to the data distribution. This helps to significantly cut down the noise sample number used in NCE and reduce the training cost. Second, DNCE discriminates between sentences generated from the noise distribution and sentences generated from the interpolation of the data distribution and the noise distribution. This alleviates the overfitting problem caused by the sparseness of the training set. With DNCE, we can successfully and efficiently train neural TRF LMs on large corpus (about 0.8 billion words) with large vocabulary (about 568 K words). Neural TRF LMs perform as good as LSTM LMs with less parameters and being 5x~114x faster in rescoring sentences. Interpolating neural TRF LMs with LSTM LMs and n-gram LMs can further reduce the error rates.
Tasks	Language Modelling
Published	2018-07-03
URL	http://arxiv.org/abs/1807.00993v1
PDF	http://arxiv.org/pdf/1807.00993v1.pdf
PWC	https://paperswithcode.com/paper/improved-training-of-neural-trans-dimensional
Repo	https://github.com/wbengine/TRF-NN-Tensorflow
Framework	tf

Deep clustering: On the link between discriminative models and K-means


Title	Deep clustering: On the link between discriminative models and K-means
Authors	Mohammed Jabi, Marco Pedersoli, Amar Mitiche, Ismail Ben Ayed
Abstract	In the context of recent deep clustering studies, discriminative models dominate the literature and report the most competitive performances. These models learn a deep discriminative neural network classifier in which the labels are latent. Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning. It is generally acknowledged that discriminative objective functions (e.g., those based on the mutual information or the KL divergence) are more flexible than generative approaches (e.g., K-means) in the sense that they make fewer assumptions about the data distributions and, typically, yield much better unsupervised deep learning results. On the surface, several recent discriminative models may seem unrelated to K-means. This study shows that these models are, in fact, equivalent to K-means under mild conditions and common posterior models and parameter regularization. We prove that, for the commonly used logistic regression posteriors, maximizing the $L_2$ regularized mutual information via an approximate alternating direction method (ADM) is equivalent to a soft and regularized K-means loss. Our theoretical analysis not only connects directly several recent state-of-the-art discriminative models to K-means, but also leads to a new soft and regularized deep K-means algorithm, which yields competitive performance on several image clustering benchmarks.
Tasks	Image Clustering
Published	2018-10-09
URL	https://arxiv.org/abs/1810.04246v2
PDF	https://arxiv.org/pdf/1810.04246v2.pdf
PWC	https://paperswithcode.com/paper/deep-clustering-on-the-link-between
Repo	https://github.com/MOhammedJAbi/SoftKMeans
Framework	pytorch

Data-Efficient Design Exploration through Surrogate-Assisted Illumination


Title	Data-Efficient Design Exploration through Surrogate-Assisted Illumination
Authors	Adam Gaier, Alexander Asteroth, Jean-Baptiste Mouret
Abstract	Design optimization techniques are often used at the beginning of the design process to explore the space of possible designs. In these domains illumination algorithms, such as MAP-Elites, are promising alternatives to classic optimization algorithms because they produce diverse, high-quality solutions in a single run, instead of only a single near-optimal solution. Unfortunately, these algorithms currently require a large number of function evaluations, limiting their applicability. In this article we introduce a new illumination algorithm, Surrogate-Assisted Illumination (SAIL), that leverages surrogate modeling techniques to create a map of the design space according to user-defined features while minimizing the number of fitness evaluations. On a 2-dimensional airfoil optimization problem SAIL produces hundreds of diverse but high-performing designs with several orders of magnitude fewer evaluations than MAP-Elites or CMA-ES. We demonstrate that SAIL is also capable of producing maps of high-performing designs in realistic 3-dimensional aerodynamic tasks with an accurate flow simulation. Data-efficient design exploration with SAIL can help designers understand what is possible, beyond what is optimal, by considering more than pure objective-based optimization.
Tasks
Published	2018-06-15
URL	http://arxiv.org/abs/1806.05865v1
PDF	http://arxiv.org/pdf/1806.05865v1.pdf
PWC	https://paperswithcode.com/paper/data-efficient-design-exploration-through
Repo	https://github.com/DanieleGravina/divergence-and-quality-diversity
Framework	none