February 1, 2020

3085 words 15 mins read

Paper Group AWR 205

Tangent Images for Mitigating Spherical Distortion. A Simple Pooling-Based Design for Real-Time Salient Object Detection. Functorial Question Answering. Measuring Arithmetic Extrapolation Performance. Orometric Methods in Bounded Metric Data. Structured Graph Learning Via Laplacian Spectral Constraints. CamemBERT: a Tasty French Language Model. Lea …

Tangent Images for Mitigating Spherical Distortion


Title	Tangent Images for Mitigating Spherical Distortion
Authors	Marc Eder, Mykhailo Shvets, John Lim, Jan-Michael Frahm
Abstract	In this work, we propose “tangent images,” a spherical image representation that facilitates transferable and scalable $360^\circ$ computer vision. Inspired by techniques in cartography and computer graphics, we render a spherical image to a set of distortion-mitigated, locally-planar image grids tangent to a subdivided icosahedron. By varying the resolution of these grids independently of the subdivision level, we can effectively represent high resolution spherical images while still benefiting from the low-distortion icosahedral spherical approximation. We show that training standard convolutional neural networks on tangent images compares favorably to the many specialized spherical convolutional kernels that have been developed, while also scaling efficiently to handle significantly higher spherical resolutions. Furthermore, because our approach does not require specialized kernels, we show that we can transfer networks trained on perspective images to spherical data without fine-tuning and with limited performance drop-off. Finally, we demonstrate that tangent images can be used to improve the quality of sparse feature detection on spherical images, illustrating its usefulness for traditional computer vision tasks like structure-from-motion and SLAM.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09390v2
PDF	https://arxiv.org/pdf/1912.09390v2.pdf
PWC	https://paperswithcode.com/paper/tangent-images-for-mitigating-spherical
Repo	https://github.com/meder411/Tangent-Images
Framework	pytorch

A Simple Pooling-Based Design for Real-Time Salient Object Detection


Title	A Simple Pooling-Based Design for Real-Time Salient Object Detection
Authors	Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang
Abstract	We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. Based on the U-shape architecture, we first build a global guidance module (GGM) upon the bottom-up pathway, aiming at providing layers at different feature levels the location information of potential salient objects. We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway. By adding FAMs after the fusion operations in the top-down pathway, coarse-level features from the GGM can be seamlessly merged with features at various scales. These two pooling-based modules allow the high-level semantic features to be progressively refined, yielding detail enriched saliency maps. Experiment results show that our proposed approach can more accurately locate the salient objects with sharpened details and hence substantially improve the performance compared to the previous state-of-the-arts. Our approach is fast as well and can run at a speed of more than 30 FPS when processing a $300 \times 400$ image. Code can be found at http://mmcheng.net/poolnet/.
Tasks	Object Detection, Salient Object Detection
Published	2019-04-21
URL	http://arxiv.org/abs/1904.09569v1
PDF	http://arxiv.org/pdf/1904.09569v1.pdf
PWC	https://paperswithcode.com/paper/a-simple-pooling-based-design-for-real-time
Repo	https://github.com/hualuluu/--every-day-paper--
Framework	none

Functorial Question Answering


Title	Functorial Question Answering
Authors	Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi
Abstract	We study the relational variant of the categorical compositional distributional (DisCoCat) models of Coecke et al, where we replace vector spaces and linear maps by sets and relations. We show that RelCoCat models factorise through Cartesian bicategories, as a corollary we get logspace reductions from semantics and entailment to evaluation and containment of conjunctive queries respectively. Finally, we define question answering as an NP-complete problem.
Tasks	Question Answering
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07408v2
PDF	https://arxiv.org/pdf/1905.07408v2.pdf
PWC	https://paperswithcode.com/paper/montague-semantics-for-lambek-pregroups
Repo	https://github.com/toumix/discopy
Framework	none

Measuring Arithmetic Extrapolation Performance


Title	Measuring Arithmetic Extrapolation Performance
Authors	Andreas Madsen, Alexander Rosenberg Johansen
Abstract	The Neural Arithmetic Logic Unit (NALU) is a neural network layer that can learn exact arithmetic operations between the elements of a hidden state. The goal of NALU is to learn perfect extrapolation, which requires learning the exact underlying logic of an unknown arithmetic problem. Evaluating the performance of the NALU is non-trivial as one arithmetic problem might have many solutions. As a consequence, single-instance MSE has been used to evaluate and compare performance between models. However, it can be hard to interpret what magnitude of MSE represents a correct solution and models sensitivity to initialization. We propose using a success-criterion to measure if and when a model converges. Using a success-criterion we can summarize success-rate over many initialization seeds and calculate confidence intervals. We contribute a generalized version of the previous arithmetic benchmark to measure models sensitivity under different conditions. This is, to our knowledge, the first extensive evaluation with respect to convergence of the NALU and its sub-units. Using a success-criterion to summarize 4800 experiments we find that consistently learning arithmetic extrapolation is challenging, in particular for multiplication.
Tasks
Published	2019-10-04
URL	https://arxiv.org/abs/1910.01888v2
PDF	https://arxiv.org/pdf/1910.01888v2.pdf
PWC	https://paperswithcode.com/paper/measuring-arithmetic-extrapolation
Repo	https://github.com/AndreasMadsen/stable-nalu
Framework	pytorch

Orometric Methods in Bounded Metric Data


Title	Orometric Methods in Bounded Metric Data
Authors	Maximilian Stubbemann, Tom Hanika, Gerd Stumme
Abstract	A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions ‘isolation’ and ‘prominence’. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of ‘isolation’ and ‘prominence’ empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France.
Tasks	Knowledge Graphs
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09239v1
PDF	https://arxiv.org/pdf/1907.09239v1.pdf
PWC	https://paperswithcode.com/paper/orometric-methods-in-bounded-metric-data
Repo	https://github.com/mstubbemann/Orometric-Methods-in-Bounded-Metric-Data
Framework	none

Structured Graph Learning Via Laplacian Spectral Constraints


Title	Structured Graph Learning Via Laplacian Spectral Constraints
Authors	Sandeep Kumar, Jiaxi Ying, Jos’e Vin’icius de M. Cardoso, Daniel P. Palomar
Abstract	Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. It is well known that structured graph learning from observed samples is an NP-hard combinatorial problem. In this paper, we first show that for a set of important graph families it is possible to convert the structural constraints of structure into eigenvalue constraints of the graph Laplacian matrix. Then we introduce a unified graph learning framework, lying at the integration of the spectral properties of the Laplacian matrix with Gaussian graphical modeling that is capable of learning structures of a large class of graph families. The proposed algorithms are provably convergent and practically amenable for large-scale semi-supervised and unsupervised graph-based learning tasks. Extensive numerical experiments with both synthetic and real data sets demonstrate the effectiveness of the proposed methods. An R package containing code for all the experimental results is available at https://cran.r-project.org/package=spectralGraphTopology.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11594v1
PDF	https://arxiv.org/pdf/1909.11594v1.pdf
PWC	https://paperswithcode.com/paper/structured-graph-learning-via-laplacian
Repo	https://github.com/dppalomar/spectralGraphTopology
Framework	none

CamemBERT: a Tasty French Language Model


Title	CamemBERT: a Tasty French Language Model
Authors	Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot
Abstract	Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models –in all languages except English– very limited. Aiming to address this issue for French, we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging, dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and downstream applications for French NLP.
Tasks	Dependency Parsing, Language Modelling, Named Entity Recognition, Natural Language Inference, Part-Of-Speech Tagging
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03894v1
PDF	https://arxiv.org/pdf/1911.03894v1.pdf
PWC	https://paperswithcode.com/paper/camembert-a-tasty-french-language-model
Repo	https://github.com/huggingface/transformers
Framework	pytorch

Learning relevant features for statistical inference


Title	Learning relevant features for statistical inference
Authors	Cédric Bény
Abstract	Given two views of data, we consider the problem of finding the features of one view which can be most faithfully inferred from the other. We find that these are also the most correlated variables in the sense of deep canonical correlation analysis (DCCA). Moreover, we show that these variables can be used to construct a non-parametric representation of the implied joint probability distribution, which can be thought of as a classical version of the Schmidt decomposition of quantum states. This representation can be used to compute the expectations of functions over one view of data conditioned on the other, such as Bayesian estimators and their standard deviations. We test the approach using inference on occluded MNIST images, and show that our representation contains multiple modes. Surprisingly, when applied to supervised learning (one dataset consists of labels), this approach automatically provides regularization and faster convergence compared to the cross-entropy objective. We also explore using this approach to discover salient independent variables of a single dataset.
Tasks
Published	2019-04-23
URL	https://arxiv.org/abs/1904.10387v4
PDF	https://arxiv.org/pdf/1904.10387v4.pdf
PWC	https://paperswithcode.com/paper/relevant-feature-extraction-for-statistical
Repo	https://github.com/cbeny/RFA
Framework	tf

Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification


Title	Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification
Authors	Vaibhav Vaibhav, Raghuram Mandyam Annasamy, Eduard Hovy
Abstract	The rising growth of fake news and misleading information through online media outlets demands an automatic method for detecting such news articles. Of the few limited works which differentiate between trusted vs other types of news article (satire, propaganda, hoax), none of them model sentence interactions within a document. We observe an interesting pattern in the way sentences interact with each other across different kind of news articles. To capture this kind of information for long news articles, we propose a graph neural network-based model which does away with the need of feature engineering for fine grained fake news classification. Through experiments, we show that our proposed method beats strong neural baselines and achieves state-of-the-art accuracy on existing datasets. Moreover, we establish the generalizability of our model by evaluating its performance in out-of-domain scenarios. Code is available at https://github.com/MysteryVaibhav/fake_news_semantics
Tasks	Feature Engineering
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12203v1
PDF	https://arxiv.org/pdf/1910.12203v1.pdf
PWC	https://paperswithcode.com/paper/do-sentence-interactions-matter-leveraging
Repo	https://github.com/MysteryVaibhav/fake_news_semantics
Framework	pytorch

Wasserstein Fair Classification


Title	Wasserstein Fair Classification
Authors	Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa
Abstract	We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances. The approach has desirable theoretical properties and is robust to specific choices of the threshold used to obtain class predictions from model outputs. We introduce different methods that enable hiding sensitive information at test time or have a simple and fast implementation. We show empirical performance against different fairness baselines on several benchmark fairness datasets.
Tasks
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12059v1
PDF	https://arxiv.org/pdf/1907.12059v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-fair-classification
Repo	https://github.com/deepmind/wasserstein_fairness
Framework	none

Jejueo Datasets for Machine Translation and Speech Synthesis


Title	Jejueo Datasets for Machine Translation and Speech Synthesis
Authors	Kyubyong Park, Yo Joong Choe, Jiyeon Ham
Abstract	Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file. Subsequently, we build neural systems of machine translation and speech synthesis using them. All resources are publicly available via our GitHub repository. We hope that these datasets will attract interest of both language and machine learning communities.
Tasks	Machine Translation, Speech Synthesis
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12071v1
PDF	https://arxiv.org/pdf/1911.12071v1.pdf
PWC	https://paperswithcode.com/paper/jejueo-datasets-for-machine-translation-and
Repo	https://github.com/kakaobrain/jejueo
Framework	tf

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis


Title	MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Authors	Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville
Abstract	Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks.
Tasks	Speech Synthesis
Published	2019-10-08
URL	https://arxiv.org/abs/1910.06711v3
PDF	https://arxiv.org/pdf/1910.06711v3.pdf
PWC	https://paperswithcode.com/paper/melgan-generative-adversarial-networks-for
Repo	https://github.com/yanggeng1995/GAN-TTS
Framework	pytorch

Texel-Att: Representing and Classifying Element-based Textures by Attributes


Title	Texel-Att: Representing and Classifying Element-based Textures by Attributes
Authors	Marco Godi, Christian Joppi, Andrea Giachetti, Fabio Pellacini, Marco Cristani
Abstract	Element-based textures are a kind of texture formed by nameable elements, the texels [1], distributed according to specific statistical distributions; it is of primary importance in many sectors, namely textile, fashion and interior design industry. State-of-theart texture descriptors fail to properly characterize element-based texture, so we present Texel-Att to fill this gap. Texel-Att is the first fine-grained, attribute-based representation and classification framework for element-based textures. It first individuates texels, characterizing them with individual attributes; subsequently, texels are grouped and characterized through layout attributes, which give the Texel-Att representation. Texels are detected by a Mask-RCNN, trained on a brand-new element-based texture dataset, ElBa, containing 30K texture images with 3M fully-annotated texels. Examples of individual and layout attributes are exhibited to give a glimpse on the level of achievable graininess. In the experiments, we present detection results to show that texels can be precisely individuated, even on textures “in the wild”; to this sake, we individuate the element-based classes of the Describable Texture Dataset (DTD), where almost 900K texels have been manually annotated, leading to the Element-based DTD (E-DTD). Subsequently, classification and ranking results demonstrate the expressivity of Texel-Att on ElBa and E-DTD, overcoming the alternative features and relative attributes, doubling the best performance in some cases; finally, we report interactive search results on ElBa and E-DTD: with Texel-Att on the E-DTD dataset we are able to individuate within 10 iterations the desired texture in the 90% of cases, against the 71% obtained with a combination of the finest existing attributes so far. Dataset and code is available at https://github.com/godimarcovr/Texel-Att
Tasks
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11127v2
PDF	https://arxiv.org/pdf/1908.11127v2.pdf
PWC	https://paperswithcode.com/paper/texel-att-representing-and-classifying
Repo	https://github.com/godimarcovr/Texel-Att
Framework	none

High Fidelity Speech Synthesis with Adversarial Networks


Title	High Fidelity Speech Synthesis with Adversarial Networks
Authors	Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan
Abstract	Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fr'echet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at https://storage.googleapis.com/deepmind-media/research/abstract.wav.
Tasks	Speech Synthesis
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11646v2
PDF	https://arxiv.org/pdf/1909.11646v2.pdf
PWC	https://paperswithcode.com/paper/high-fidelity-speech-synthesis-with-1
Repo	https://github.com/yanggeng1995/GAN-TTS
Framework	pytorch

EnAET: Self-Trained Ensemble AutoEncoding Transformations for Semi-Supervised Learning


Title	EnAET: Self-Trained Ensemble AutoEncoding Transformations for Semi-Supervised Learning
Authors	Xiao Wang, Daisuke Kihara, Jiebo Luo, Guo-Jun Qi
Abstract	Deep neural networks have been successfully applied to many real-world applications. However, these successes rely heavily on large amounts of labeled data, which is expensive to obtain. Recently, Auto-Encoding Transformation (AET) and MixMatch have been proposed and achieved state-of-the-art results for unsupervised and semi-supervised learning, respectively. In this study, we train an Ensemble of Auto-Encoding Transformations (EnAET) to learn from both labeled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations. This distinguishes EnAET from conventional semi-supervised methods that focus on improving prediction consistency and confidence by different models on both unlabeled and labeled examples. In contrast, we propose to explore the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Experiment results on CIFAR-10, CIFAR-100, SVHN and STL10 demonstrate that the proposed EnAET outperforms the state-of-the-art semi-supervised methods by significant margins. In particular, we apply the proposed method to extremely challenging scenarios with only 10 images per class, and show that EnAET can achieve an error rate of 9.35% on CIFAR-10 and 16.92% on SVHN. In addition, EnAET achieves the best result when compared with fully supervised learning using all labeled data with the same network architecture. The performance on CIFAR-10, CIFAR-100 and SVHN with a smaller network is even more competitive than the state-of-the-art of supervised learning methods based on a larger network. We also set a new performance record with an error rate of 1.99% on CIFAR-10 and 4.52% on STL10. The code and experiment records are released at https://github.com/maple-research-lab/EnAET.
Tasks	Image Classification, Semi-Supervised Image Classification
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09265v1
PDF	https://arxiv.org/pdf/1911.09265v1.pdf
PWC	https://paperswithcode.com/paper/enaet-self-trained-ensemble-autoencoding
Repo	https://github.com/maple-research-lab/EnAET
Framework	pytorch