July 30, 2019

2869 words 14 mins read

Paper Group AWR 46

Surface Networks. LanideNN: Multilingual Language Identification on Character Window. Coupled Ensembles of Neural Networks. Learning by Association - A versatile semi-supervised training method for neural networks. RoomNet: End-to-End Room Layout Estimation. A non-projective greedy dependency parser with bidirectional LSTMs. Deep Speaker: an End-to …

Surface Networks


Title	Surface Networks
Authors	Ilya Kostrikov, Zhongshi Jiang, Daniele Panozzo, Denis Zorin, Joan Bruna
Abstract	We study data-driven representations for three-dimensional triangle meshes, which are one of the prevalent objects used to represent 3D geometry. Recent works have developed models that exploit the intrinsic geometry of manifolds and graphs, namely the Graph Neural Networks (GNNs) and its spectral variants, which learn from the local metric tensor via the Laplacian operator. Despite offering excellent sample complexity and built-in invariances, intrinsic geometry alone is invariant to isometric deformations, making it unsuitable for many applications. To overcome this limitation, we propose several upgrades to GNNs to leverage extrinsic differential geometry properties of three-dimensional surfaces, increasing its modeling power. In particular, we propose to exploit the Dirac operator, whose spectrum detects principal curvature directions — this is in stark contrast with the classical Laplace operator, which directly measures mean curvature. We coin the resulting models \emph{Surface Networks (SN)}. We prove that these models define shape representations that are stable to deformation and to discretization, and we demonstrate the efficiency and versatility of SNs on two challenging tasks: temporal prediction of mesh deformations under non-linear dynamics and generative models using a variational autoencoder framework with encoders/decoders given by SNs.
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10819v2
PDF	http://arxiv.org/pdf/1705.10819v2.pdf
PWC	https://paperswithcode.com/paper/surface-networks
Repo	https://github.com/jiangzhongshi/SurfaceNetworks
Framework	pytorch

LanideNN: Multilingual Language Identification on Character Window


Title	LanideNN: Multilingual Language Identification on Character Window
Authors	Tom Kocmi, Ondřej Bojar
Abstract	In language identification, a common first step in natural language processing, we want to automatically determine the language of some input text. Monolingual language identification assumes that the given document is written in one language. In multilingual language identification, the document is usually in two or three languages and we just want their names. We aim one step further and propose a method for textual language identification where languages can change arbitrarily and the goal is to identify the spans of each of the languages. Our method is based on Bidirectional Recurrent Neural Networks and it performs well in monolingual and multilingual language identification tasks on six datasets covering 131 languages. The method keeps the accuracy also for short documents and across domains, so it is ideal for off-the-shelf use without preparation of training data.
Tasks	Language Identification
Published	2017-01-12
URL	http://arxiv.org/abs/1701.03338v2
PDF	http://arxiv.org/pdf/1701.03338v2.pdf
PWC	https://paperswithcode.com/paper/lanidenn-multilingual-language-identification
Repo	https://github.com/tomkocmi/LanideNN
Framework	tf

Coupled Ensembles of Neural Networks


Title	Coupled Ensembles of Neural Networks
Authors	Anuvabh Dutt, Denis Pellerin, Georges Quénot
Abstract	We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the “fuse (averaging) layer” before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as “coupled ensembles”. The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06053v1
PDF	http://arxiv.org/pdf/1709.06053v1.pdf
PWC	https://paperswithcode.com/paper/coupled-ensembles-of-neural-networks
Repo	https://github.com/grey-area/modular-loss-experiments
Framework	pytorch

Learning by Association - A versatile semi-supervised training method for neural networks


Title	Learning by Association - A versatile semi-supervised training method for neural networks
Authors	Philip Häusser, Alexander Mordvintsev, Daniel Cremers
Abstract	In many real-world scenarios, labeled data for a specific machine learning task is costly to obtain. Semi-supervised training methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. “Associations” are made from embeddings of labeled samples to those of unlabeled ones and back. The optimization schedule encourages correct association cycles that end up at the same class from which the association was started and penalizes wrong associations ending at a different class. The implementation is easy to use and can be added to any existing end-to-end training setup. We demonstrate the capabilities of learning by association on several data sets and show that it can improve performance on classification tasks tremendously by making use of additionally available unlabeled data. In particular, for cases with few labeled data, our training scheme outperforms the current state of the art on SVHN.
Tasks
Published	2017-06-03
URL	http://arxiv.org/abs/1706.00909v1
PDF	http://arxiv.org/pdf/1706.00909v1.pdf
PWC	https://paperswithcode.com/paper/learning-by-association-a-versatile-semi
Repo	https://github.com/haeusser/learning_by_association
Framework	tf

RoomNet: End-to-End Room Layout Estimation


Title	RoomNet: End-to-End Room Layout Estimation
Authors	Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, Andrew Rabinovich
Abstract	This paper focuses on the task of room layout estimation from a monocular RGB image. Prior works break the problem into two sub-tasks: semantic segmentation of floor, walls, ceiling to produce layout hypotheses, followed by an iterative optimization step to rank these hypotheses. In contrast, we adopt a more direct formulation of this problem as one of estimating an ordered set of room layout keypoints. The room layout and the corresponding segmentation is completely specified given the locations of these ordered keypoints. We predict the locations of the room layout keypoints using RoomNet, an end-to-end trainable encoder-decoder network. On the challenging benchmark datasets Hedau and LSUN, we achieve state-of-the-art performance along with 200x to 600x speedup compared to the most recent work. Additionally, we present optional extensions to the RoomNet architecture such as including recurrent computations and memory units to refine the keypoint locations under the same parametric capacity.
Tasks	Room Layout Estimation, Semantic Segmentation
Published	2017-03-18
URL	http://arxiv.org/abs/1703.06241v2
PDF	http://arxiv.org/pdf/1703.06241v2.pdf
PWC	https://paperswithcode.com/paper/roomnet-end-to-end-room-layout-estimation
Repo	https://github.com/FengyangZhang/caffe_roomnet
Framework	none

A non-projective greedy dependency parser with bidirectional LSTMs


Title	A non-projective greedy dependency parser with bidirectional LSTMs
Authors	David Vilares, Carlos Gómez-Rodríguez
Abstract	The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods and using the baseline segmentation and PoS tagging, the parser obtained good results on both macro-average LAS and UAS in the big treebanks category (55 languages), ranking 7th out of 33 teams. In the all treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the all and big categories is mainly due to the poor performance on four parallel PUD treebanks, suggesting that some `suffixed' treebanks (e.g. Spanish-AnCora) perform poorly on cross-treebank settings, which does not occur with the corresponding` unsuffixed’ treebank (e.g. Spanish). By changing that, we obtain the 11th best LAS among all runs (official and unofficial). The code is made available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSE
Tasks	Dependency Parsing
Published	2017-07-11
URL	http://arxiv.org/abs/1707.03228v1
PDF	http://arxiv.org/pdf/1707.03228v1.pdf
PWC	https://paperswithcode.com/paper/a-non-projective-greedy-dependency-parser
Repo	https://github.com/CoNLL-UD-2017/LyS-FASTPARSE
Framework	none

Deep Speaker: an End-to-End Neural Speaker Embedding System


Title	Deep Speaker: an End-to-End Neural Speaker Embedding System
Authors	Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu
Abstract	We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity. The embeddings generated by Deep Speaker can be used for many tasks, including speaker identification, verification, and clustering. We experiment with ResCNN and GRU architectures to extract the acoustic features, then mean pool to produce utterance-level speaker embeddings, and train using triplet loss based on cosine similarity. Experiments on three distinct datasets suggest that Deep Speaker outperforms a DNN-based i-vector baseline. For example, Deep Speaker reduces the verification equal error rate by 50% (relatively) and improves the identification accuracy by 60% (relatively) on a text-independent dataset. We also present results that suggest adapting from a model trained with Mandarin can improve accuracy for English speaker recognition.
Tasks	Speaker Identification, Speaker Recognition
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02304v1
PDF	http://arxiv.org/pdf/1705.02304v1.pdf
PWC	https://paperswithcode.com/paper/deep-speaker-an-end-to-end-neural-speaker
Repo	https://github.com/prajual/Deep_Speaker
Framework	none

Input-to-Output Gate to Improve RNN Language Models


Title	Input-to-Output Gate to Improve RNN Language Models
Authors	Sho Takase, Jun Suzuki, Masaaki Nagata
Abstract	This paper proposes a reinforcing method that refines the output layers of existing Recurrent Neural Network (RNN) language models. We refer to our proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple structure, and thus, can be easily combined with any RNN language models. Our experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG consistently boosts the performance of several different types of current topline RNN language models.
Tasks
Published	2017-09-26
URL	http://arxiv.org/abs/1709.08907v2
PDF	http://arxiv.org/pdf/1709.08907v2.pdf
PWC	https://paperswithcode.com/paper/input-to-output-gate-to-improve-rnn-language
Repo	https://github.com/nttcslab-nlp/iog
Framework	none

AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms


Title	AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms
Authors	Kory Becker, Justin Gottschlich
Abstract	In this paper, we present the first-of-its-kind machine learning (ML) system, called AI Programmer, that can automatically generate full software programs requiring only minimal human guidance. At its core, AI Programmer uses genetic algorithms (GA) coupled with a tightly constrained programming language that minimizes the overhead of its ML search space. Part of AI Programmer’s novelty stems from (i) its unique system design, including an embedded, hand-crafted interpreter for efficiency and security and (ii) its augmentation of GAs to include instruction-gene randomization bindings and programming language-specific genome construction and elimination techniques. We provide a detailed examination of AI Programmer’s system design, several examples detailing how the system works, and experimental data demonstrating its software generation capabilities and performance using only mainstream CPUs.
Tasks
Published	2017-09-17
URL	http://arxiv.org/abs/1709.05703v1
PDF	http://arxiv.org/pdf/1709.05703v1.pdf
PWC	https://paperswithcode.com/paper/ai-programmer-autonomously-creating-software
Repo	https://github.com/primaryobjects/AI-Programmer
Framework	none

SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set


Title	SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set
Authors	William Havard, Laurent Besacier, Olivier Rosec
Abstract	This paper presents an augmentation of MSCOCO dataset where speech is added to image and text. Speech captions are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images. Disfluencies and speed perturbation are added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text. Investigating multimodal learning schemes for unsupervised speech pattern discovery is also possible with this corpus, as demonstrated by a preliminary study conducted on a subset of the corpus (10h, 10k spoken captions).
Tasks
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08435v4
PDF	http://arxiv.org/pdf/1707.08435v4.pdf
PWC	https://paperswithcode.com/paper/speech-coco-600k-visually-grounded-spoken
Repo	https://github.com/William-N-Havard/SpeechCoco
Framework	none

Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution


Title	Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution
Authors	Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
Abstract	Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
Tasks	Image Super-Resolution, Super-Resolution
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03915v2
PDF	http://arxiv.org/pdf/1704.03915v2.pdf
PWC	https://paperswithcode.com/paper/deep-laplacian-pyramid-networks-for-fast-and
Repo	https://github.com/nhatsmrt/superres
Framework	pytorch

Tensorizing Generative Adversarial Nets


Title	Tensorizing Generative Adversarial Nets
Authors	Xingwei Cao, Xuyang Zhao, Qibin Zhao
Abstract	Generative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such as mobile phones. In this paper, we present a new generative adversarial framework by representing each layer as a tensor structure connected by multilinear operations, aiming to reduce the number of model parameters by a large factor while preserving the generative performance and sample quality. To learn the model, we employ an efficient algorithm which alternatively optimizes both discriminator and generator. Experimental outcomes demonstrate that our model can achieve high compression rate for model parameters up to $35$ times when compared to the original GAN for MNIST dataset.
Tasks
Published	2017-10-30
URL	http://arxiv.org/abs/1710.10772v2
PDF	http://arxiv.org/pdf/1710.10772v2.pdf
PWC	https://paperswithcode.com/paper/tensorizing-generative-adversarial-nets
Repo	https://github.com/xwcao/TGAN
Framework	tf


Title	Explanation in Artificial Intelligence: Insights from the Social Sciences
Authors	Tim Miller
Abstract	There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to make their algorithms more understandable. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that looking at how humans explain to each other can serve as a useful starting point for explanation in artificial intelligence. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers’ intuition of what constitutes a `good’ explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations, which argues that people employ certain cognitive biases and social expectations towards the explanation process. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence. \|
Tasks
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07269v3
PDF	http://arxiv.org/pdf/1706.07269v3.pdf
PWC	https://paperswithcode.com/paper/explanation-in-artificial-intelligence
Repo	https://github.com/tobiasgerstenberg/causal_cognition
Framework	none

Mitigating Adversarial Effects Through Randomization


Title	Mitigating Adversarial Effects Through Randomization
Authors	Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille
Abstract	Convolutional neural networks have demonstrated high accuracy on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. For example, imperceptible perturbations added to clean images can cause convolutional neural networks to fail. In this paper, we propose to utilize randomization at inference time to mitigate adversarial effects. Specifically, we use two randomization operations: random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner. Extensive experiments demonstrate that the proposed randomization method is very effective at defending against both single-step and iterative attacks. Our method provides the following advantages: 1) no additional training or fine-tuning, 2) very few additional computations, 3) compatible with other adversarial defense methods. By combining the proposed randomization method with an adversarially trained model, it achieves a normalized score of 0.924 (ranked No.2 among 107 defense teams) in the NIPS 2017 adversarial examples defense challenge, which is far better than using adversarial training alone with a normalized score of 0.773 (ranked No.56). The code is public available at https://github.com/cihangxie/NIPS2017_adv_challenge_defense.
Tasks	Adversarial Defense, Image Classification
Published	2017-11-06
URL	http://arxiv.org/abs/1711.01991v3
PDF	http://arxiv.org/pdf/1711.01991v3.pdf
PWC	https://paperswithcode.com/paper/mitigating-adversarial-effects-through
Repo	https://github.com/cihangxie/DI-2-FGSM
Framework	tf

What is the Essence of a Claim? Cross-Domain Claim Identification


Title	What is the Essence of a Claim? Cross-Domain Claim Identification
Authors	Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, Iryna Gurevych
Abstract	Argument mining has become a popular research area in NLP. It typically includes the identification of argumentative components, e.g. claims, as the central component of an argument. We perform a qualitative analysis across six different datasets and show that these appear to conceptualize claims quite differently. To learn about the consequences of such different conceptualizations of claim for practical applications, we carried out extensive experiments using state-of-the-art feature-rich and deep learning systems, to identify claims in a cross-domain fashion. While the divergent perception of claims in different datasets is indeed harmful to cross-domain classification, we show that there are shared properties on the lexical level as well as system configurations that can help to overcome these gaps.
Tasks	Argument Mining
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07203v3
PDF	http://arxiv.org/pdf/1704.07203v3.pdf
PWC	https://paperswithcode.com/paper/what-is-the-essence-of-a-claim-cross-domain
Repo	https://github.com/UKPLab/emnlp2017-claim-identification
Framework	none