October 19, 2019

2680 words 13 mins read

Paper Group ANR 229

Input Combination Strategies for Multi-Source Transformer Decoder. SUSiNet: See, Understand and Summarize it. Improved English to Russian Translation by Neural Suffix Prediction. Hierarchical Multitask Learning for CTC-based Speech Recognition. Phocas: dimensional Byzantine-resilient stochastic gradient descent. Beyond One-hot Encoding: lower dimen …

Input Combination Strategies for Multi-Source Transformer Decoder


Title	Input Combination Strategies for Multi-Source Transformer Decoder
Authors	Jindřich Libovický, Jindřich Helcl, David Mareček
Abstract	In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines.
Tasks
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04716v1
PDF	http://arxiv.org/pdf/1811.04716v1.pdf
PWC	https://paperswithcode.com/paper/input-combination-strategies-for-multi-source-1
Repo
Framework

SUSiNet: See, Understand and Summarize it


Title	SUSiNet: See, Understand and Summarize it
Authors	Petros Koutras, Petros Maragos
Abstract	In this work we propose a multi-task spatio-temporal network, called SUSiNet, that can jointly tackle the spatio-temporal problems of saliency estimation, action recognition and video summarization. Our approach employs a single network that is jointly end-to-end trained for all tasks with multiple and diverse datasets related to the exploring tasks. The proposed network uses a unified architecture that includes global and task specific layer and produces multiple output types, i.e., saliency maps or classification labels, by employing the same video input. Moreover, one additional contribution is that the proposed network can be deeply supervised through an attention module that is related to human attention as it is expressed by eye-tracking data. From the extensive evaluation, on seven different datasets, we have observed that the multi-task network performs as well as the state-of-the-art single-task methods (or in some cases better), while it requires less computational budget than having one independent network per each task.
Tasks	Action Recognition In Videos, Eye Tracking, Saliency Prediction, Temporal Action Localization, Video Summarization
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00722v2
PDF	http://arxiv.org/pdf/1812.00722v2.pdf
PWC	https://paperswithcode.com/paper/susinet-see-understand-and-summarize-it
Repo
Framework

Improved English to Russian Translation by Neural Suffix Prediction


Title	Improved English to Russian Translation by Neural Suffix Prediction
Authors	Kai Song, Yue Zhang, Min Zhang, Weihua Luo
Abstract	Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages. To address this problem, previous work focused on adjusting translation granularity or expanding the vocabulary size. However, morphological information is relatively under-considered in NMT architectures, which may further improve translation quality. We propose a novel method, which can not only reduce data sparsity but also model morphology through a simple but effective mechanism. By predicting the stem and suffix separately during decoding, our system achieves an improvement of up to 1.98 BLEU compared with previous work on English to Russian translation. Our method is orthogonal to different NMT architectures and stably gains improvements on various domains.
Tasks	Machine Translation
Published	2018-01-11
URL	http://arxiv.org/abs/1801.03615v1
PDF	http://arxiv.org/pdf/1801.03615v1.pdf
PWC	https://paperswithcode.com/paper/improved-english-to-russian-translation-by
Repo
Framework

Hierarchical Multitask Learning for CTC-based Speech Recognition


Title	Hierarchical Multitask Learning for CTC-based Speech Recognition
Authors	Kalpesh Krishna, Shubham Toshniwal, Karen Livescu
Abstract	Previous work has shown that neural encoder-decoder speech recognition can be improved with hierarchical multitask learning, where auxiliary tasks are added at intermediate layers of a deep encoder. We explore the effect of hierarchical multitask learning in the context of connectionist temporal classification (CTC)-based speech recognition, and investigate several aspects of this approach. Consistent with previous work, we observe performance improvements on telephone conversational speech recognition (specifically the Eval2000 test sets) when training a subword-level CTC model with an auxiliary phone loss at an intermediate layer. We analyze the effects of a number of experimental variables (like interpolation constant and position of the auxiliary loss function), performance in lower-resource settings, and the relationship between pretraining and multitask learning. We observe that the hierarchical multitask approach improves over standard multitask training in our higher-data experiments, while in the low-resource settings standard multitask training works well. The best results are obtained by combining hierarchical multitask learning and pretraining, which improves word error rates by 3.4% absolute on the Eval2000 test sets.
Tasks	Speech Recognition
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06234v2
PDF	http://arxiv.org/pdf/1807.06234v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-multitask-learning-for-ctc-based
Repo
Framework

Phocas: dimensional Byzantine-resilient stochastic gradient descent


Title	Phocas: dimensional Byzantine-resilient stochastic gradient descent
Authors	Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
Abstract	We propose a novel robust aggregation rule for distributed synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model. The attackers can arbitrarily manipulate the data transferred between the servers and the workers in the parameter server~(PS) architecture. We prove the Byzantine resilience of the proposed aggregation rules. Empirical analysis shows that the proposed techniques outperform current approaches for realistic use cases and Byzantine attack scenarios.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09682v1
PDF	http://arxiv.org/pdf/1805.09682v1.pdf
PWC	https://paperswithcode.com/paper/phocas-dimensional-byzantine-resilient
Repo
Framework

Beyond One-hot Encoding: lower dimensional target embedding


Title	Beyond One-hot Encoding: lower dimensional target embedding
Authors	Pau Rodríguez, Miguel A. Bautista, Jordi Gonzàlez, Sergio Escalera
Abstract	Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, One-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space, but instead lies in a low-dimensional output manifold. Following this observation, we embed the targets into a low-dimensional space, drastically improving convergence speed while preserving accuracy. Our contribution is two fold: (i) We show that random projections of the label space are a valid tool to find such lower dimensional embeddings, boosting dramatically convergence rates at zero computational cost; and (ii) we propose a normalized eigenrepresentation of the class manifold that encodes the targets with minimal information loss, improving the accuracy of random projections encoding while enjoying the same convergence rates. Experiments on CIFAR-100, CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach drastically improves convergence speed while reaching very competitive accuracy rates.
Tasks
Published	2018-06-28
URL	http://arxiv.org/abs/1806.10805v1
PDF	http://arxiv.org/pdf/1806.10805v1.pdf
PWC	https://paperswithcode.com/paper/beyond-one-hot-encoding-lower-dimensional
Repo
Framework

Noiseprint: a CNN-based camera model fingerprint


Title	Noiseprint: a CNN-based camera model fingerprint
Authors	Davide Cozzolino, Luisa Verdoliva
Abstract	Forensic analyses of digital images rely heavily on the traces of in-camera and out-camera processes left on the acquired images. Such traces represent a sort of camera fingerprint. If one is able to recover them, by suppressing the high-level scene content and other disturbances, a number of forensic tasks can be easily accomplished. A notable example is the PRNU pattern, which can be regarded as a device fingerprint, and has received great attention in multimedia forensics. In this paper we propose a method to extract a camera model fingerprint, called noiseprint, where the scene content is largely suppressed and model-related artifacts are enhanced. This is obtained by means of a Siamese network, which is trained with pairs of image patches coming from the same (label +1) or different (label -1) cameras. Although noiseprints can be used for a large variety of forensic tasks, here we focus on image forgery localization. Experiments on several datasets widespread in the forensic community show noiseprint-based methods to provide state-of-the-art performance.
Tasks
Published	2018-08-25
URL	http://arxiv.org/abs/1808.08396v1
PDF	http://arxiv.org/pdf/1808.08396v1.pdf
PWC	https://paperswithcode.com/paper/noiseprint-a-cnn-based-camera-model
Repo
Framework

Transferring Physical Motion Between Domains for Neural Inertial Tracking


Title	Transferring Physical Motion Between Domains for Neural Inertial Tracking
Authors	Changhao Chen, Yishu Miao, Chris Xiaoxuan Lu, Phil Blunsom, Andrew Markham, Niki Trigoni
Abstract	Inertial information processing plays a pivotal role in ego-motion awareness for mobile agents, as inertial measurements are entirely egocentric and not environment dependent. However, they are affected greatly by changes in sensor placement/orientation or motion dynamics, and it is infeasible to collect labelled data from every domain. To overcome the challenges of domain adaptation on long sensory sequences, we propose a novel framework that extracts domain-invariant features of raw sequences from arbitrary domains, and transforms to new domains without any paired data. Through the experiments, we demonstrate that it is able to efficiently and effectively convert the raw sequence from a new unlabelled target domain into an accurate inertial trajectory, benefiting from the physical motion knowledge transferred from the labelled source domain. We also conduct real-world experiments to show our framework can reconstruct physically meaningful trajectories from raw IMU measurements obtained with a standard mobile phone in various attachments.
Tasks	Domain Adaptation
Published	2018-10-04
URL	http://arxiv.org/abs/1810.02076v1
PDF	http://arxiv.org/pdf/1810.02076v1.pdf
PWC	https://paperswithcode.com/paper/transferring-physical-motion-between-domains
Repo
Framework

Adversarial classification: An adversarial risk analysis approach


Title	Adversarial classification: An adversarial risk analysis approach
Authors	Roi Naveiro, Alberto Redondo, David Ríos Insua, Fabrizio Ruggeri
Abstract	Classification problems in security settings are usually contemplated as confrontations in which one or more adversaries try to fool a classifier to obtain a benefit. Most approaches to such adversarial classification problems have focused on game theoretical ideas with strong underlying common knowledge assumptions, which are actually not realistic in security domains. We provide an alternative framework to such problem based on adversarial risk analysis, which we illustrate with several examples. Computational and implementation issues are discussed.
Tasks
Published	2018-02-21
URL	https://arxiv.org/abs/1802.07513v3
PDF	https://arxiv.org/pdf/1802.07513v3.pdf
PWC	https://paperswithcode.com/paper/adversarial-classification-an-adversarial
Repo
Framework

Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects


Title	Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects
Authors	Nicolas Le Hir, Olivier Sigaud, Alban Laflaquière
Abstract	Perceiving the surrounding environment in terms of objects is useful for any general purpose intelligent agent. In this paper, we investigate a fundamental mechanism making object perception possible, namely the identification of spatio-temporally invariant structures in the sensorimotor experience of an agent. We take inspiration from the Sensorimotor Contingencies Theory to define a computational model of this mechanism through a sensorimotor, unsupervised and predictive approach. Our model is based on processing the unsupervised interaction of an artificial agent with its environment. We show how spatio-temporally invariant structures in the environment induce regularities in the sensorimotor experience of an agent, and how this agent, while building a predictive model of its sensorimotor experience, can capture them as densely connected subgraphs in a graph of sensory states connected by motor commands. Our approach is focused on elementary mechanisms, and is illustrated with a set of simple experiments in which an agent interacts with an environment. We show how the agent can build an internal model of moving but spatio-temporally invariant structures by performing a Spectral Clustering of the graph modeling its overall sensorimotor experiences. We systematically examine properties of the model, shedding light more globally on the specificities of the paradigm with respect to methods based on the supervised processing of collections of static images.
Tasks
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05057v1
PDF	http://arxiv.org/pdf/1810.05057v1.pdf
PWC	https://paperswithcode.com/paper/identification-of-invariant-sensorimotor
Repo
Framework

Probability Calibration Trees


Title	Probability Calibration Trees
Authors	Tim Leathart, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer
Abstract	Obtaining accurate and well calibrated probability estimates from classifiers is useful in many applications, for example, when minimising the expected cost of classifications. Existing methods of calibrating probability estimates are applied globally, ignoring the potential for improvements by applying a more fine-grained model. We propose probability calibration trees, a modification of logistic model trees that identifies regions of the input space in which different probability calibration models are learned to improve performance. We compare probability calibration trees to two widely used calibration methods—isotonic regression and Platt scaling—and show that our method results in lower root mean squared error on average than both methods, for estimates produced by a variety of base learners.
Tasks	Calibration
Published	2018-07-31
URL	http://arxiv.org/abs/1808.00111v2
PDF	http://arxiv.org/pdf/1808.00111v2.pdf
PWC	https://paperswithcode.com/paper/probability-calibration-trees
Repo
Framework

Neural Character-based Composition Models for Abuse Detection


Title	Neural Character-based Composition Models for Abuse Detection
Authors	Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova
Abstract	The advent of social media in recent years has fed into some highly undesirable phenomena such as proliferation of offensive language, hate speech, sexist remarks, etc. on the Internet. In light of this, there have been several efforts to automate the detection and moderation of such abusive content. However, deliberate obfuscation of words by users to evade detection poses a serious challenge to the effectiveness of these efforts. The current state of the art approaches to abusive language detection, based on recurrent neural networks, do not explicitly address this problem and resort to a generic OOV (out of vocabulary) embedding for unseen words. However, in using a single embedding for all unseen words we lose the ability to distinguish between obfuscated and non-obfuscated or rare words. In this paper, we address this problem by designing a model that can compose embeddings for unseen words. We experimentally demonstrate that our approach significantly advances the current state of the art in abuse detection on datasets from two different domains, namely Twitter and Wikipedia talk page.
Tasks	Abuse Detection
Published	2018-09-02
URL	http://arxiv.org/abs/1809.00378v1
PDF	http://arxiv.org/pdf/1809.00378v1.pdf
PWC	https://paperswithcode.com/paper/neural-character-based-composition-models-for
Repo
Framework

Where is this? Video geolocation based on neural network features


Title	Where is this? Video geolocation based on neural network features
Authors	Salvador Medina, Zhuyun Dai, Yingkai Gao
Abstract	In this work we propose a method that geolocates videos within a delimited widespread area based solely on the frames visual content. Our proposed method tackles video-geolocation through traditional image retrieval techniques considering Google Street View as the reference point. To achieve this goal we use the deep learning features obtained from NetVLAD to represent images, since through this feature vectors the similarity is their L2 norm. In this paper, we propose a family of voting-based methods to aggregate frame-wise geolocation results which boost the video geolocation result. The best aggregation found through our experiments considers both NetVLAD and SIFT similarity, as well as the geolocation density of the most similar results. To test our proposed method, we gathered a new video dataset from Pittsburgh Downtown area to benefit and stimulate more work in this area. Our system achieved a precision of 90% while geolocating videos within a range of 150 meters or two blocks away from the original position.
Tasks	Image Retrieval
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09068v2
PDF	http://arxiv.org/pdf/1810.09068v2.pdf
PWC	https://paperswithcode.com/paper/where-is-this-video-geolocation-based-on
Repo
Framework

SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach


Title	SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach
Authors	Michael Petrochuk, Luke Zettlemoyer
Abstract	The SimpleQuestions dataset is one of the most commonly used benchmarks for studying single-relation factoid questions. In this paper, we present new evidence that this benchmark can be nearly solved by standard methods. First we show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the linguistic signal alone. Second we introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, despite using standard methods. Finally, we report an empirical analysis showing that the upperbound is loose; roughly a third of the remaining errors are also not resolvable from the linguistic signal. Together, these results suggest that the SimpleQuestions dataset is nearly solved.
Tasks
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08798v1
PDF	http://arxiv.org/pdf/1804.08798v1.pdf
PWC	https://paperswithcode.com/paper/simplequestions-nearly-solved-a-new
Repo
Framework

Ensembles of Nested Dichotomies with Multiple Subset Evaluation


Title	Ensembles of Nested Dichotomies with Multiple Subset Evaluation
Authors	Tim Leathart, Eibe Frank, Bernhard Pfahringer, Geoffrey Holmes
Abstract	A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of nested dichotomies produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of nested dichotomies, regardless of whether they are employed as an individual model or in an ensemble setting.
Tasks
Published	2018-09-08
URL	http://arxiv.org/abs/1809.02740v2
PDF	http://arxiv.org/pdf/1809.02740v2.pdf
PWC	https://paperswithcode.com/paper/ensembles-of-nested-dichotomies-with-multiple
Repo
Framework