Paper Group AWR 205
Tangent Images for Mitigating Spherical Distortion. A Simple Pooling-Based Design for Real-Time Salient Object Detection. Functorial Question Answering. Measuring Arithmetic Extrapolation Performance. Orometric Methods in Bounded Metric Data. Structured Graph Learning Via Laplacian Spectral Constraints. CamemBERT: a Tasty French Language Model. Lea …
Tangent Images for Mitigating Spherical Distortion
Title | Tangent Images for Mitigating Spherical Distortion |
Authors | Marc Eder, Mykhailo Shvets, John Lim, Jan-Michael Frahm |
Abstract | In this work, we propose “tangent images,” a spherical image representation that facilitates transferable and scalable $360^\circ$ computer vision. Inspired by techniques in cartography and computer graphics, we render a spherical image to a set of distortion-mitigated, locally-planar image grids tangent to a subdivided icosahedron. By varying the resolution of these grids independently of the subdivision level, we can effectively represent high resolution spherical images while still benefiting from the low-distortion icosahedral spherical approximation. We show that training standard convolutional neural networks on tangent images compares favorably to the many specialized spherical convolutional kernels that have been developed, while also scaling efficiently to handle significantly higher spherical resolutions. Furthermore, because our approach does not require specialized kernels, we show that we can transfer networks trained on perspective images to spherical data without fine-tuning and with limited performance drop-off. Finally, we demonstrate that tangent images can be used to improve the quality of sparse feature detection on spherical images, illustrating its usefulness for traditional computer vision tasks like structure-from-motion and SLAM. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09390v2 |
https://arxiv.org/pdf/1912.09390v2.pdf | |
PWC | https://paperswithcode.com/paper/tangent-images-for-mitigating-spherical |
Repo | https://github.com/meder411/Tangent-Images |
Framework | pytorch |
A Simple Pooling-Based Design for Real-Time Salient Object Detection
Title | A Simple Pooling-Based Design for Real-Time Salient Object Detection |
Authors | Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang |
Abstract | We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. Based on the U-shape architecture, we first build a global guidance module (GGM) upon the bottom-up pathway, aiming at providing layers at different feature levels the location information of potential salient objects. We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway. By adding FAMs after the fusion operations in the top-down pathway, coarse-level features from the GGM can be seamlessly merged with features at various scales. These two pooling-based modules allow the high-level semantic features to be progressively refined, yielding detail enriched saliency maps. Experiment results show that our proposed approach can more accurately locate the salient objects with sharpened details and hence substantially improve the performance compared to the previous state-of-the-arts. Our approach is fast as well and can run at a speed of more than 30 FPS when processing a $300 \times 400$ image. Code can be found at http://mmcheng.net/poolnet/. |
Tasks | Object Detection, Salient Object Detection |
Published | 2019-04-21 |
URL | http://arxiv.org/abs/1904.09569v1 |
http://arxiv.org/pdf/1904.09569v1.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-pooling-based-design-for-real-time |
Repo | https://github.com/hualuluu/--every-day-paper-- |
Framework | none |
Functorial Question Answering
Title | Functorial Question Answering |
Authors | Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi |
Abstract | We study the relational variant of the categorical compositional distributional (DisCoCat) models of Coecke et al, where we replace vector spaces and linear maps by sets and relations. We show that RelCoCat models factorise through Cartesian bicategories, as a corollary we get logspace reductions from semantics and entailment to evaluation and containment of conjunctive queries respectively. Finally, we define question answering as an NP-complete problem. |
Tasks | Question Answering |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07408v2 |
https://arxiv.org/pdf/1905.07408v2.pdf | |
PWC | https://paperswithcode.com/paper/montague-semantics-for-lambek-pregroups |
Repo | https://github.com/toumix/discopy |
Framework | none |
Measuring Arithmetic Extrapolation Performance
Title | Measuring Arithmetic Extrapolation Performance |
Authors | Andreas Madsen, Alexander Rosenberg Johansen |
Abstract | The Neural Arithmetic Logic Unit (NALU) is a neural network layer that can learn exact arithmetic operations between the elements of a hidden state. The goal of NALU is to learn perfect extrapolation, which requires learning the exact underlying logic of an unknown arithmetic problem. Evaluating the performance of the NALU is non-trivial as one arithmetic problem might have many solutions. As a consequence, single-instance MSE has been used to evaluate and compare performance between models. However, it can be hard to interpret what magnitude of MSE represents a correct solution and models sensitivity to initialization. We propose using a success-criterion to measure if and when a model converges. Using a success-criterion we can summarize success-rate over many initialization seeds and calculate confidence intervals. We contribute a generalized version of the previous arithmetic benchmark to measure models sensitivity under different conditions. This is, to our knowledge, the first extensive evaluation with respect to convergence of the NALU and its sub-units. Using a success-criterion to summarize 4800 experiments we find that consistently learning arithmetic extrapolation is challenging, in particular for multiplication. |
Tasks | |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.01888v2 |
https://arxiv.org/pdf/1910.01888v2.pdf | |
PWC | https://paperswithcode.com/paper/measuring-arithmetic-extrapolation |
Repo | https://github.com/AndreasMadsen/stable-nalu |
Framework | pytorch |
Orometric Methods in Bounded Metric Data
Title | Orometric Methods in Bounded Metric Data |
Authors | Maximilian Stubbemann, Tom Hanika, Gerd Stumme |
Abstract | A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions ‘isolation’ and ‘prominence’. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of ‘isolation’ and ‘prominence’ empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France. |
Tasks | Knowledge Graphs |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09239v1 |
https://arxiv.org/pdf/1907.09239v1.pdf | |
PWC | https://paperswithcode.com/paper/orometric-methods-in-bounded-metric-data |
Repo | https://github.com/mstubbemann/Orometric-Methods-in-Bounded-Metric-Data |
Framework | none |
Structured Graph Learning Via Laplacian Spectral Constraints
Title | Structured Graph Learning Via Laplacian Spectral Constraints |
Authors | Sandeep Kumar, Jiaxi Ying, Jos’e Vin’icius de M. Cardoso, Daniel P. Palomar |
Abstract | Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. It is well known that structured graph learning from observed samples is an NP-hard combinatorial problem. In this paper, we first show that for a set of important graph families it is possible to convert the structural constraints of structure into eigenvalue constraints of the graph Laplacian matrix. Then we introduce a unified graph learning framework, lying at the integration of the spectral properties of the Laplacian matrix with Gaussian graphical modeling that is capable of learning structures of a large class of graph families. The proposed algorithms are provably convergent and practically amenable for large-scale semi-supervised and unsupervised graph-based learning tasks. Extensive numerical experiments with both synthetic and real data sets demonstrate the effectiveness of the proposed methods. An R package containing code for all the experimental results is available at https://cran.r-project.org/package=spectralGraphTopology. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.11594v1 |
https://arxiv.org/pdf/1909.11594v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-graph-learning-via-laplacian |
Repo | https://github.com/dppalomar/spectralGraphTopology |
Framework | none |
CamemBERT: a Tasty French Language Model
Title | CamemBERT: a Tasty French Language Model |
Authors | Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot |
Abstract | Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages. This makes practical use of such models –in all languages except English– very limited. Aiming to address this issue for French, we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging, dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and downstream applications for French NLP. |
Tasks | Dependency Parsing, Language Modelling, Named Entity Recognition, Natural Language Inference, Part-Of-Speech Tagging |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03894v1 |
https://arxiv.org/pdf/1911.03894v1.pdf | |
PWC | https://paperswithcode.com/paper/camembert-a-tasty-french-language-model |
Repo | https://github.com/huggingface/transformers |
Framework | pytorch |
Learning relevant features for statistical inference
Title | Learning relevant features for statistical inference |
Authors | Cédric Bény |
Abstract | Given two views of data, we consider the problem of finding the features of one view which can be most faithfully inferred from the other. We find that these are also the most correlated variables in the sense of deep canonical correlation analysis (DCCA). Moreover, we show that these variables can be used to construct a non-parametric representation of the implied joint probability distribution, which can be thought of as a classical version of the Schmidt decomposition of quantum states. This representation can be used to compute the expectations of functions over one view of data conditioned on the other, such as Bayesian estimators and their standard deviations. We test the approach using inference on occluded MNIST images, and show that our representation contains multiple modes. Surprisingly, when applied to supervised learning (one dataset consists of labels), this approach automatically provides regularization and faster convergence compared to the cross-entropy objective. We also explore using this approach to discover salient independent variables of a single dataset. |
Tasks | |
Published | 2019-04-23 |
URL | https://arxiv.org/abs/1904.10387v4 |
https://arxiv.org/pdf/1904.10387v4.pdf | |
PWC | https://paperswithcode.com/paper/relevant-feature-extraction-for-statistical |
Repo | https://github.com/cbeny/RFA |
Framework | tf |
Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification
Title | Do Sentence Interactions Matter? Leveraging Sentence Level Representations for Fake News Classification |
Authors | Vaibhav Vaibhav, Raghuram Mandyam Annasamy, Eduard Hovy |
Abstract | The rising growth of fake news and misleading information through online media outlets demands an automatic method for detecting such news articles. Of the few limited works which differentiate between trusted vs other types of news article (satire, propaganda, hoax), none of them model sentence interactions within a document. We observe an interesting pattern in the way sentences interact with each other across different kind of news articles. To capture this kind of information for long news articles, we propose a graph neural network-based model which does away with the need of feature engineering for fine grained fake news classification. Through experiments, we show that our proposed method beats strong neural baselines and achieves state-of-the-art accuracy on existing datasets. Moreover, we establish the generalizability of our model by evaluating its performance in out-of-domain scenarios. Code is available at https://github.com/MysteryVaibhav/fake_news_semantics |
Tasks | Feature Engineering |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12203v1 |
https://arxiv.org/pdf/1910.12203v1.pdf | |
PWC | https://paperswithcode.com/paper/do-sentence-interactions-matter-leveraging |
Repo | https://github.com/MysteryVaibhav/fake_news_semantics |
Framework | pytorch |
Wasserstein Fair Classification
Title | Wasserstein Fair Classification |
Authors | Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, Silvia Chiappa |
Abstract | We propose an approach to fair classification that enforces independence between the classifier outputs and sensitive information by minimizing Wasserstein-1 distances. The approach has desirable theoretical properties and is robust to specific choices of the threshold used to obtain class predictions from model outputs. We introduce different methods that enable hiding sensitive information at test time or have a simple and fast implementation. We show empirical performance against different fairness baselines on several benchmark fairness datasets. |
Tasks | |
Published | 2019-07-28 |
URL | https://arxiv.org/abs/1907.12059v1 |
https://arxiv.org/pdf/1907.12059v1.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-fair-classification |
Repo | https://github.com/deepmind/wasserstein_fairness |
Framework | none |
Jejueo Datasets for Machine Translation and Speech Synthesis
Title | Jejueo Datasets for Machine Translation and Speech Synthesis |
Authors | Kyubyong Park, Yo Joong Choe, Jiyeon Ham |
Abstract | Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file. Subsequently, we build neural systems of machine translation and speech synthesis using them. All resources are publicly available via our GitHub repository. We hope that these datasets will attract interest of both language and machine learning communities. |
Tasks | Machine Translation, Speech Synthesis |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12071v1 |
https://arxiv.org/pdf/1911.12071v1.pdf | |
PWC | https://paperswithcode.com/paper/jejueo-datasets-for-machine-translation-and |
Repo | https://github.com/kakaobrain/jejueo |
Framework | tf |
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Title | MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis |
Authors | Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brebisson, Yoshua Bengio, Aaron Courville |
Abstract | Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks. |
Tasks | Speech Synthesis |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.06711v3 |
https://arxiv.org/pdf/1910.06711v3.pdf | |
PWC | https://paperswithcode.com/paper/melgan-generative-adversarial-networks-for |
Repo | https://github.com/yanggeng1995/GAN-TTS |
Framework | pytorch |
Texel-Att: Representing and Classifying Element-based Textures by Attributes
Title | Texel-Att: Representing and Classifying Element-based Textures by Attributes |
Authors | Marco Godi, Christian Joppi, Andrea Giachetti, Fabio Pellacini, Marco Cristani |
Abstract | Element-based textures are a kind of texture formed by nameable elements, the texels [1], distributed according to specific statistical distributions; it is of primary importance in many sectors, namely textile, fashion and interior design industry. State-of-theart texture descriptors fail to properly characterize element-based texture, so we present Texel-Att to fill this gap. Texel-Att is the first fine-grained, attribute-based representation and classification framework for element-based textures. It first individuates texels, characterizing them with individual attributes; subsequently, texels are grouped and characterized through layout attributes, which give the Texel-Att representation. Texels are detected by a Mask-RCNN, trained on a brand-new element-based texture dataset, ElBa, containing 30K texture images with 3M fully-annotated texels. Examples of individual and layout attributes are exhibited to give a glimpse on the level of achievable graininess. In the experiments, we present detection results to show that texels can be precisely individuated, even on textures “in the wild”; to this sake, we individuate the element-based classes of the Describable Texture Dataset (DTD), where almost 900K texels have been manually annotated, leading to the Element-based DTD (E-DTD). Subsequently, classification and ranking results demonstrate the expressivity of Texel-Att on ElBa and E-DTD, overcoming the alternative features and relative attributes, doubling the best performance in some cases; finally, we report interactive search results on ElBa and E-DTD: with Texel-Att on the E-DTD dataset we are able to individuate within 10 iterations the desired texture in the 90% of cases, against the 71% obtained with a combination of the finest existing attributes so far. Dataset and code is available at https://github.com/godimarcovr/Texel-Att |
Tasks | |
Published | 2019-08-29 |
URL | https://arxiv.org/abs/1908.11127v2 |
https://arxiv.org/pdf/1908.11127v2.pdf | |
PWC | https://paperswithcode.com/paper/texel-att-representing-and-classifying |
Repo | https://github.com/godimarcovr/Texel-Att |
Framework | none |
High Fidelity Speech Synthesis with Adversarial Networks
Title | High Fidelity Speech Synthesis with Adversarial Networks |
Authors | Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan |
Abstract | Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introduce GAN-TTS, a Generative Adversarial Network for Text-to-Speech. Our architecture is composed of a conditional feed-forward generator producing raw speech audio, and an ensemble of discriminators which operate on random windows of different sizes. The discriminators analyse the audio both in terms of general realism, as well as how well the audio corresponds to the utterance that should be pronounced. To measure the performance of GAN-TTS, we employ both subjective human evaluation (MOS - Mean Opinion Score), as well as novel quantitative metrics (Fr'echet DeepSpeech Distance and Kernel DeepSpeech Distance), which we find to be well correlated with MOS. We show that GAN-TTS is capable of generating high-fidelity speech with naturalness comparable to the state-of-the-art models, and unlike autoregressive models, it is highly parallelisable thanks to an efficient feed-forward generator. Listen to GAN-TTS reading this abstract at https://storage.googleapis.com/deepmind-media/research/abstract.wav. |
Tasks | Speech Synthesis |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11646v2 |
https://arxiv.org/pdf/1909.11646v2.pdf | |
PWC | https://paperswithcode.com/paper/high-fidelity-speech-synthesis-with-1 |
Repo | https://github.com/yanggeng1995/GAN-TTS |
Framework | pytorch |
EnAET: Self-Trained Ensemble AutoEncoding Transformations for Semi-Supervised Learning
Title | EnAET: Self-Trained Ensemble AutoEncoding Transformations for Semi-Supervised Learning |
Authors | Xiao Wang, Daisuke Kihara, Jiebo Luo, Guo-Jun Qi |
Abstract | Deep neural networks have been successfully applied to many real-world applications. However, these successes rely heavily on large amounts of labeled data, which is expensive to obtain. Recently, Auto-Encoding Transformation (AET) and MixMatch have been proposed and achieved state-of-the-art results for unsupervised and semi-supervised learning, respectively. In this study, we train an Ensemble of Auto-Encoding Transformations (EnAET) to learn from both labeled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations. This distinguishes EnAET from conventional semi-supervised methods that focus on improving prediction consistency and confidence by different models on both unlabeled and labeled examples. In contrast, we propose to explore the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Experiment results on CIFAR-10, CIFAR-100, SVHN and STL10 demonstrate that the proposed EnAET outperforms the state-of-the-art semi-supervised methods by significant margins. In particular, we apply the proposed method to extremely challenging scenarios with only 10 images per class, and show that EnAET can achieve an error rate of 9.35% on CIFAR-10 and 16.92% on SVHN. In addition, EnAET achieves the best result when compared with fully supervised learning using all labeled data with the same network architecture. The performance on CIFAR-10, CIFAR-100 and SVHN with a smaller network is even more competitive than the state-of-the-art of supervised learning methods based on a larger network. We also set a new performance record with an error rate of 1.99% on CIFAR-10 and 4.52% on STL10. The code and experiment records are released at https://github.com/maple-research-lab/EnAET. |
Tasks | Image Classification, Semi-Supervised Image Classification |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09265v1 |
https://arxiv.org/pdf/1911.09265v1.pdf | |
PWC | https://paperswithcode.com/paper/enaet-self-trained-ensemble-autoencoding |
Repo | https://github.com/maple-research-lab/EnAET |
Framework | pytorch |