July 30, 2019

2869 words 14 mins read

Paper Group AWR 46

Paper Group AWR 46

Surface Networks. LanideNN: Multilingual Language Identification on Character Window. Coupled Ensembles of Neural Networks. Learning by Association - A versatile semi-supervised training method for neural networks. RoomNet: End-to-End Room Layout Estimation. A non-projective greedy dependency parser with bidirectional LSTMs. Deep Speaker: an End-to …

Surface Networks

Title Surface Networks
Authors Ilya Kostrikov, Zhongshi Jiang, Daniele Panozzo, Denis Zorin, Joan Bruna
Abstract We study data-driven representations for three-dimensional triangle meshes, which are one of the prevalent objects used to represent 3D geometry. Recent works have developed models that exploit the intrinsic geometry of manifolds and graphs, namely the Graph Neural Networks (GNNs) and its spectral variants, which learn from the local metric tensor via the Laplacian operator. Despite offering excellent sample complexity and built-in invariances, intrinsic geometry alone is invariant to isometric deformations, making it unsuitable for many applications. To overcome this limitation, we propose several upgrades to GNNs to leverage extrinsic differential geometry properties of three-dimensional surfaces, increasing its modeling power. In particular, we propose to exploit the Dirac operator, whose spectrum detects principal curvature directions — this is in stark contrast with the classical Laplace operator, which directly measures mean curvature. We coin the resulting models \emph{Surface Networks (SN)}. We prove that these models define shape representations that are stable to deformation and to discretization, and we demonstrate the efficiency and versatility of SNs on two challenging tasks: temporal prediction of mesh deformations under non-linear dynamics and generative models using a variational autoencoder framework with encoders/decoders given by SNs.
Tasks
Published 2017-05-30
URL http://arxiv.org/abs/1705.10819v2
PDF http://arxiv.org/pdf/1705.10819v2.pdf
PWC https://paperswithcode.com/paper/surface-networks
Repo https://github.com/jiangzhongshi/SurfaceNetworks
Framework pytorch

LanideNN: Multilingual Language Identification on Character Window

Title LanideNN: Multilingual Language Identification on Character Window
Authors Tom Kocmi, Ondřej Bojar
Abstract In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text. Monolingual language identification assumes that the given document is written in one language. In multilingual language identification, the document is usually in two or three languages and we just want their names. We aim one step further and propose a method for textual language identification where languages can change arbitrarily and the goal is to identify the spans of each of the languages. Our method is based on Bidirectional Recurrent Neural Networks and it performs well in monolingual and multilingual language identification tasks on six datasets covering 131 languages. The method keeps the accuracy also for short documents and across domains, so it is ideal for off-the-shelf use without preparation of training data.
Tasks Language Identification
Published 2017-01-12
URL http://arxiv.org/abs/1701.03338v2
PDF http://arxiv.org/pdf/1701.03338v2.pdf
PWC https://paperswithcode.com/paper/lanidenn-multilingual-language-identification
Repo https://github.com/tomkocmi/LanideNN
Framework tf

Coupled Ensembles of Neural Networks

Title Coupled Ensembles of Neural Networks
Authors Anuvabh Dutt, Denis Pellerin, Georges Quénot
Abstract We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the “fuse (averaging) layer” before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as “coupled ensembles”. The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks.
Tasks
Published 2017-09-18
URL http://arxiv.org/abs/1709.06053v1
PDF http://arxiv.org/pdf/1709.06053v1.pdf
PWC https://paperswithcode.com/paper/coupled-ensembles-of-neural-networks
Repo https://github.com/grey-area/modular-loss-experiments
Framework pytorch

Learning by Association - A versatile semi-supervised training method for neural networks

Title Learning by Association - A versatile semi-supervised training method for neural networks
Authors Philip Häusser, Alexander Mordvintsev, Daniel Cremers
Abstract In many real-world scenarios, labeled data for a specific machine learning task is costly to obtain. Semi-supervised training methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. “Associations” are made from embeddings of labeled samples to those of unlabeled ones and back. The optimization schedule encourages correct association cycles that end up at the same class from which the association was started and penalizes wrong associations ending at a different class. The implementation is easy to use and can be added to any existing end-to-end training setup. We demonstrate the capabilities of learning by association on several data sets and show that it can improve performance on classification tasks tremendously by making use of additionally available unlabeled data. In particular, for cases with few labeled data, our training scheme outperforms the current state of the art on SVHN.
Tasks
Published 2017-06-03
URL http://arxiv.org/abs/1706.00909v1
PDF http://arxiv.org/pdf/1706.00909v1.pdf
PWC https://paperswithcode.com/paper/learning-by-association-a-versatile-semi
Repo https://github.com/haeusser/learning_by_association
Framework tf

RoomNet: End-to-End Room Layout Estimation

Title RoomNet: End-to-End Room Layout Estimation
Authors Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, Andrew Rabinovich
Abstract This paper focuses on the task of room layout estimation from a monocular RGB image. Prior works break the problem into two sub-tasks: semantic segmentation of floor, walls, ceiling to produce layout hypotheses, followed by an iterative optimization step to rank these hypotheses. In contrast, we adopt a more direct formulation of this problem as one of estimating an ordered set of room layout keypoints. The room layout and the corresponding segmentation is completely specified given the locations of these ordered keypoints. We predict the locations of the room layout keypoints using RoomNet, an end-to-end trainable encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN, we achieve state-of-the-art performance along with 200x to 600x speedup compared to the most recent work. Additionally, we present optional extensions to the RoomNet architecture such as including recurrent computations and memory units to refine the keypoint locations under the same parametric capacity.
Tasks Room Layout Estimation, Semantic Segmentation
Published 2017-03-18
URL http://arxiv.org/abs/1703.06241v2
PDF http://arxiv.org/pdf/1703.06241v2.pdf
PWC https://paperswithcode.com/paper/roomnet-end-to-end-room-layout-estimation
Repo https://github.com/FengyangZhang/caffe_roomnet
Framework none

A non-projective greedy dependency parser with bidirectional LSTMs

Title A non-projective greedy dependency parser with bidirectional LSTMs
Authors David Vilares, Carlos Gómez-Rodríguez
Abstract The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods and using the baseline segmentation and PoS tagging, the parser obtained good results on both macro-average LAS and UAS in the big treebanks category (55 languages), ranking 7th out of 33 teams. In the all treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the all and big categories is mainly due to the poor performance on four parallel PUD treebanks, suggesting that some suffixed' treebanks (e.g. Spanish-AnCora) perform poorly on cross-treebank settings, which does not occur with the corresponding unsuffixed’ treebank (e.g. Spanish). By changing that, we obtain the 11th best LAS among all runs (official and unofficial). The code is made available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSE
Tasks Dependency Parsing
Published 2017-07-11
URL http://arxiv.org/abs/1707.03228v1
PDF http://arxiv.org/pdf/1707.03228v1.pdf
PWC https://paperswithcode.com/paper/a-non-projective-greedy-dependency-parser
Repo https://github.com/CoNLL-UD-2017/LyS-FASTPARSE
Framework none

Deep Speaker: an End-to-End Neural Speaker Embedding System

Title Deep Speaker: an End-to-End Neural Speaker Embedding System
Authors Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu
Abstract We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity. The embeddings generated by Deep Speaker can be used for many tasks, including speaker identification, verification, and clustering. We experiment with ResCNN and GRU architectures to extract the acoustic features, then mean pool to produce utterance-level speaker embeddings, and train using triplet loss based on cosine similarity. Experiments on three distinct datasets suggest that Deep Speaker outperforms a DNN-based i-vector baseline. For example, Deep Speaker reduces the verification equal error rate by 50% (relatively) and improves the identification accuracy by 60% (relatively) on a text-independent dataset. We also present results that suggest adapting from a model trained with Mandarin can improve accuracy for English speaker recognition.
Tasks Speaker Identification, Speaker Recognition
Published 2017-05-05
URL http://arxiv.org/abs/1705.02304v1
PDF http://arxiv.org/pdf/1705.02304v1.pdf
PWC https://paperswithcode.com/paper/deep-speaker-an-end-to-end-neural-speaker
Repo https://github.com/prajual/Deep_Speaker
Framework none

Input-to-Output Gate to Improve RNN Language Models

Title Input-to-Output Gate to Improve RNN Language Models
Authors Sho Takase, Jun Suzuki, Masaaki Nagata
Abstract This paper proposes a reinforcing method that refines the output layers of existing Recurrent Neural Network (RNN) language models. We refer to our proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple structure, and thus, can be easily combined with any RNN language models. Our experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG consistently boosts the performance of several different types of current topline RNN language models.
Tasks
Published 2017-09-26
URL http://arxiv.org/abs/1709.08907v2
PDF http://arxiv.org/pdf/1709.08907v2.pdf
PWC https://paperswithcode.com/paper/input-to-output-gate-to-improve-rnn-language
Repo https://github.com/nttcslab-nlp/iog
Framework none

AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms

Title AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms
Authors Kory Becker, Justin Gottschlich
Abstract In this paper, we present the first-of-its-kind machine learning (ML) system, called AI Programmer, that can automatically generate full software programs requiring only minimal human guidance. At its core, AI Programmer uses genetic algorithms (GA) coupled with a tightly constrained programming language that minimizes the overhead of its ML search space. Part of AI Programmer’s novelty stems from (i) its unique system design, including an embedded, hand-crafted interpreter for efficiency and security and (ii) its augmentation of GAs to include instruction-gene randomization bindings and programming language-specific genome construction and elimination techniques. We provide a detailed examination of AI Programmer’s system design, several examples detailing how the system works, and experimental data demonstrating its software generation capabilities and performance using only mainstream CPUs.
Tasks
Published 2017-09-17
URL http://arxiv.org/abs/1709.05703v1
PDF http://arxiv.org/pdf/1709.05703v1.pdf
PWC https://paperswithcode.com/paper/ai-programmer-autonomously-creating-software
Repo https://github.com/primaryobjects/AI-Programmer
Framework none

SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set

Title SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
Authors William Havard, Laurent Besacier, Olivier Rosec
Abstract This paper presents an augmentation of MSCOCO dataset where speech is added to image and text. Speech captions are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images. Disfluencies and speed perturbation are added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text. Investigating multimodal learning schemes for unsupervised speech pattern discovery is also possible with this corpus, as demonstrated by a preliminary study conducted on a subset of the corpus (10h, 10k spoken captions).
Tasks
Published 2017-07-26
URL http://arxiv.org/abs/1707.08435v4
PDF http://arxiv.org/pdf/1707.08435v4.pdf
PWC https://paperswithcode.com/paper/speech-coco-600k-visually-grounded-spoken
Repo https://github.com/William-N-Havard/SpeechCoco
Framework none

Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

Title Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution
Authors Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
Abstract Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
Tasks Image Super-Resolution, Super-Resolution
Published 2017-04-12
URL http://arxiv.org/abs/1704.03915v2
PDF http://arxiv.org/pdf/1704.03915v2.pdf
PWC https://paperswithcode.com/paper/deep-laplacian-pyramid-networks-for-fast-and
Repo https://github.com/nhatsmrt/superres
Framework pytorch

Tensorizing Generative Adversarial Nets

Title Tensorizing Generative Adversarial Nets
Authors Xingwei Cao, Xuyang Zhao, Qibin Zhao
Abstract Generative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such as mobile phones. In this paper, we present a new generative adversarial framework by representing each layer as a tensor structure connected by multilinear operations, aiming to reduce the number of model parameters by a large factor while preserving the generative performance and sample quality. To learn the model, we employ an efficient algorithm which alternatively optimizes both discriminator and generator. Experimental outcomes demonstrate that our model can achieve high compression rate for model parameters up to $35$ times when compared to the original GAN for MNIST dataset.
Tasks
Published 2017-10-30
URL http://arxiv.org/abs/1710.10772v2
PDF http://arxiv.org/pdf/1710.10772v2.pdf
PWC https://paperswithcode.com/paper/tensorizing-generative-adversarial-nets
Repo https://github.com/xwcao/TGAN
Framework tf

Explanation in Artificial Intelligence: Insights from the Social Sciences

Title Explanation in Artificial Intelligence: Insights from the Social Sciences
Authors Tim Miller
Abstract There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to make their algorithms more understandable. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that looking at how humans explain to each other can serve as a useful starting point for explanation in artificial intelligence. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers’ intuition of what constitutes a `good’ explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations, which argues that people employ certain cognitive biases and social expectations towards the explanation process. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence. |
Tasks
Published 2017-06-22
URL http://arxiv.org/abs/1706.07269v3
PDF http://arxiv.org/pdf/1706.07269v3.pdf
PWC https://paperswithcode.com/paper/explanation-in-artificial-intelligence
Repo https://github.com/tobiasgerstenberg/causal_cognition
Framework none

Mitigating Adversarial Effects Through Randomization

Title Mitigating Adversarial Effects Through Randomization
Authors Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille
Abstract Convolutional neural networks have demonstrated high accuracy on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. For example, imperceptible perturbations added to clean images can cause convolutional neural networks to fail. In this paper, we propose to utilize randomization at inference time to mitigate adversarial effects. Specifically, we use two randomization operations: random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner. Extensive experiments demonstrate that the proposed randomization method is very effective at defending against both single-step and iterative attacks. Our method provides the following advantages: 1) no additional training or fine-tuning, 2) very few additional computations, 3) compatible with other adversarial defense methods. By combining the proposed randomization method with an adversarially trained model, it achieves a normalized score of 0.924 (ranked No.2 among 107 defense teams) in the NIPS 2017 adversarial examples defense challenge, which is far better than using adversarial training alone with a normalized score of 0.773 (ranked No.56). The code is public available at https://github.com/cihangxie/NIPS2017_adv_challenge_defense.
Tasks Adversarial Defense, Image Classification
Published 2017-11-06
URL http://arxiv.org/abs/1711.01991v3
PDF http://arxiv.org/pdf/1711.01991v3.pdf
PWC https://paperswithcode.com/paper/mitigating-adversarial-effects-through
Repo https://github.com/cihangxie/DI-2-FGSM
Framework tf

What is the Essence of a Claim? Cross-Domain Claim Identification

Title What is the Essence of a Claim? Cross-Domain Claim Identification
Authors Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, Iryna Gurevych
Abstract Argument mining has become a popular research area in NLP. It typically includes the identification of argumentative components, e.g. claims, as the central component of an argument. We perform a qualitative analysis across six different datasets and show that these appear to conceptualize claims quite differently. To learn about the consequences of such different conceptualizations of claim for practical applications, we carried out extensive experiments using state-of-the-art feature-rich and deep learning systems, to identify claims in a cross-domain fashion. While the divergent perception of claims in different datasets is indeed harmful to cross-domain classification, we show that there are shared properties on the lexical level as well as system configurations that can help to overcome these gaps.
Tasks Argument Mining
Published 2017-04-24
URL http://arxiv.org/abs/1704.07203v3
PDF http://arxiv.org/pdf/1704.07203v3.pdf
PWC https://paperswithcode.com/paper/what-is-the-essence-of-a-claim-cross-domain
Repo https://github.com/UKPLab/emnlp2017-claim-identification
Framework none
comments powered by Disqus