Paper Group ANR 229
Input Combination Strategies for Multi-Source Transformer Decoder. SUSiNet: See, Understand and Summarize it. Improved English to Russian Translation by Neural Suffix Prediction. Hierarchical Multitask Learning for CTC-based Speech Recognition. Phocas: dimensional Byzantine-resilient stochastic gradient descent. Beyond One-hot Encoding: lower dimen …
Input Combination Strategies for Multi-Source Transformer Decoder
Title | Input Combination Strategies for Multi-Source Transformer Decoder |
Authors | Jindřich Libovický, Jindřich Helcl, David Mareček |
Abstract | In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways. This topic has been thoroughly studied on recurrent architectures. In this paper, we extend the previous work to the encoder-decoder attention in the Transformer architecture. We propose four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical. We evaluate our methods on tasks of multimodal translation and translation with multiple source languages. The experiments show that the models are able to use multiple sources and improve over single source baselines. |
Tasks | |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04716v1 |
http://arxiv.org/pdf/1811.04716v1.pdf | |
PWC | https://paperswithcode.com/paper/input-combination-strategies-for-multi-source-1 |
Repo | |
Framework | |
SUSiNet: See, Understand and Summarize it
Title | SUSiNet: See, Understand and Summarize it |
Authors | Petros Koutras, Petros Maragos |
Abstract | In this work we propose a multi-task spatio-temporal network, called SUSiNet, that can jointly tackle the spatio-temporal problems of saliency estimation, action recognition and video summarization. Our approach employs a single network that is jointly end-to-end trained for all tasks with multiple and diverse datasets related to the exploring tasks. The proposed network uses a unified architecture that includes global and task specific layer and produces multiple output types, i.e., saliency maps or classification labels, by employing the same video input. Moreover, one additional contribution is that the proposed network can be deeply supervised through an attention module that is related to human attention as it is expressed by eye-tracking data. From the extensive evaluation, on seven different datasets, we have observed that the multi-task network performs as well as the state-of-the-art single-task methods (or in some cases better), while it requires less computational budget than having one independent network per each task. |
Tasks | Action Recognition In Videos, Eye Tracking, Saliency Prediction, Temporal Action Localization, Video Summarization |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00722v2 |
http://arxiv.org/pdf/1812.00722v2.pdf | |
PWC | https://paperswithcode.com/paper/susinet-see-understand-and-summarize-it |
Repo | |
Framework | |
Improved English to Russian Translation by Neural Suffix Prediction
Title | Improved English to Russian Translation by Neural Suffix Prediction |
Authors | Kai Song, Yue Zhang, Min Zhang, Weihua Luo |
Abstract | Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages. To address this problem, previous work focused on adjusting translation granularity or expanding the vocabulary size. However, morphological information is relatively under-considered in NMT architectures, which may further improve translation quality. We propose a novel method, which can not only reduce data sparsity but also model morphology through a simple but effective mechanism. By predicting the stem and suffix separately during decoding, our system achieves an improvement of up to 1.98 BLEU compared with previous work on English to Russian translation. Our method is orthogonal to different NMT architectures and stably gains improvements on various domains. |
Tasks | Machine Translation |
Published | 2018-01-11 |
URL | http://arxiv.org/abs/1801.03615v1 |
http://arxiv.org/pdf/1801.03615v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-english-to-russian-translation-by |
Repo | |
Framework | |
Hierarchical Multitask Learning for CTC-based Speech Recognition
Title | Hierarchical Multitask Learning for CTC-based Speech Recognition |
Authors | Kalpesh Krishna, Shubham Toshniwal, Karen Livescu |
Abstract | Previous work has shown that neural encoder-decoder speech recognition can be improved with hierarchical multitask learning, where auxiliary tasks are added at intermediate layers of a deep encoder. We explore the effect of hierarchical multitask learning in the context of connectionist temporal classification (CTC)-based speech recognition, and investigate several aspects of this approach. Consistent with previous work, we observe performance improvements on telephone conversational speech recognition (specifically the Eval2000 test sets) when training a subword-level CTC model with an auxiliary phone loss at an intermediate layer. We analyze the effects of a number of experimental variables (like interpolation constant and position of the auxiliary loss function), performance in lower-resource settings, and the relationship between pretraining and multitask learning. We observe that the hierarchical multitask approach improves over standard multitask training in our higher-data experiments, while in the low-resource settings standard multitask training works well. The best results are obtained by combining hierarchical multitask learning and pretraining, which improves word error rates by 3.4% absolute on the Eval2000 test sets. |
Tasks | Speech Recognition |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06234v2 |
http://arxiv.org/pdf/1807.06234v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-multitask-learning-for-ctc-based |
Repo | |
Framework | |
Phocas: dimensional Byzantine-resilient stochastic gradient descent
Title | Phocas: dimensional Byzantine-resilient stochastic gradient descent |
Authors | Cong Xie, Oluwasanmi Koyejo, Indranil Gupta |
Abstract | We propose a novel robust aggregation rule for distributed synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model. The attackers can arbitrarily manipulate the data transferred between the servers and the workers in the parameter server~(PS) architecture. We prove the Byzantine resilience of the proposed aggregation rules. Empirical analysis shows that the proposed techniques outperform current approaches for realistic use cases and Byzantine attack scenarios. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.09682v1 |
http://arxiv.org/pdf/1805.09682v1.pdf | |
PWC | https://paperswithcode.com/paper/phocas-dimensional-byzantine-resilient |
Repo | |
Framework | |
Beyond One-hot Encoding: lower dimensional target embedding
Title | Beyond One-hot Encoding: lower dimensional target embedding |
Authors | Pau Rodríguez, Miguel A. Bautista, Jordi Gonzàlez, Sergio Escalera |
Abstract | Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, One-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space, but instead lies in a low-dimensional output manifold. Following this observation, we embed the targets into a low-dimensional space, drastically improving convergence speed while preserving accuracy. Our contribution is two fold: (i) We show that random projections of the label space are a valid tool to find such lower dimensional embeddings, boosting dramatically convergence rates at zero computational cost; and (ii) we propose a normalized eigenrepresentation of the class manifold that encodes the targets with minimal information loss, improving the accuracy of random projections encoding while enjoying the same convergence rates. Experiments on CIFAR-100, CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach drastically improves convergence speed while reaching very competitive accuracy rates. |
Tasks | |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10805v1 |
http://arxiv.org/pdf/1806.10805v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-one-hot-encoding-lower-dimensional |
Repo | |
Framework | |
Noiseprint: a CNN-based camera model fingerprint
Title | Noiseprint: a CNN-based camera model fingerprint |
Authors | Davide Cozzolino, Luisa Verdoliva |
Abstract | Forensic analyses of digital images rely heavily on the traces of in-camera and out-camera processes left on the acquired images. Such traces represent a sort of camera fingerprint. If one is able to recover them, by suppressing the high-level scene content and other disturbances, a number of forensic tasks can be easily accomplished. A notable example is the PRNU pattern, which can be regarded as a device fingerprint, and has received great attention in multimedia forensics. In this paper we propose a method to extract a camera model fingerprint, called noiseprint, where the scene content is largely suppressed and model-related artifacts are enhanced. This is obtained by means of a Siamese network, which is trained with pairs of image patches coming from the same (label +1) or different (label -1) cameras. Although noiseprints can be used for a large variety of forensic tasks, here we focus on image forgery localization. Experiments on several datasets widespread in the forensic community show noiseprint-based methods to provide state-of-the-art performance. |
Tasks | |
Published | 2018-08-25 |
URL | http://arxiv.org/abs/1808.08396v1 |
http://arxiv.org/pdf/1808.08396v1.pdf | |
PWC | https://paperswithcode.com/paper/noiseprint-a-cnn-based-camera-model |
Repo | |
Framework | |
Transferring Physical Motion Between Domains for Neural Inertial Tracking
Title | Transferring Physical Motion Between Domains for Neural Inertial Tracking |
Authors | Changhao Chen, Yishu Miao, Chris Xiaoxuan Lu, Phil Blunsom, Andrew Markham, Niki Trigoni |
Abstract | Inertial information processing plays a pivotal role in ego-motion awareness for mobile agents, as inertial measurements are entirely egocentric and not environment dependent. However, they are affected greatly by changes in sensor placement/orientation or motion dynamics, and it is infeasible to collect labelled data from every domain. To overcome the challenges of domain adaptation on long sensory sequences, we propose a novel framework that extracts domain-invariant features of raw sequences from arbitrary domains, and transforms to new domains without any paired data. Through the experiments, we demonstrate that it is able to efficiently and effectively convert the raw sequence from a new unlabelled target domain into an accurate inertial trajectory, benefiting from the physical motion knowledge transferred from the labelled source domain. We also conduct real-world experiments to show our framework can reconstruct physically meaningful trajectories from raw IMU measurements obtained with a standard mobile phone in various attachments. |
Tasks | Domain Adaptation |
Published | 2018-10-04 |
URL | http://arxiv.org/abs/1810.02076v1 |
http://arxiv.org/pdf/1810.02076v1.pdf | |
PWC | https://paperswithcode.com/paper/transferring-physical-motion-between-domains |
Repo | |
Framework | |
Adversarial classification: An adversarial risk analysis approach
Title | Adversarial classification: An adversarial risk analysis approach |
Authors | Roi Naveiro, Alberto Redondo, David Ríos Insua, Fabrizio Ruggeri |
Abstract | Classification problems in security settings are usually contemplated as confrontations in which one or more adversaries try to fool a classifier to obtain a benefit. Most approaches to such adversarial classification problems have focused on game theoretical ideas with strong underlying common knowledge assumptions, which are actually not realistic in security domains. We provide an alternative framework to such problem based on adversarial risk analysis, which we illustrate with several examples. Computational and implementation issues are discussed. |
Tasks | |
Published | 2018-02-21 |
URL | https://arxiv.org/abs/1802.07513v3 |
https://arxiv.org/pdf/1802.07513v3.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-classification-an-adversarial |
Repo | |
Framework | |
Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects
Title | Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects |
Authors | Nicolas Le Hir, Olivier Sigaud, Alban Laflaquière |
Abstract | Perceiving the surrounding environment in terms of objects is useful for any general purpose intelligent agent. In this paper, we investigate a fundamental mechanism making object perception possible, namely the identification of spatio-temporally invariant structures in the sensorimotor experience of an agent. We take inspiration from the Sensorimotor Contingencies Theory to define a computational model of this mechanism through a sensorimotor, unsupervised and predictive approach. Our model is based on processing the unsupervised interaction of an artificial agent with its environment. We show how spatio-temporally invariant structures in the environment induce regularities in the sensorimotor experience of an agent, and how this agent, while building a predictive model of its sensorimotor experience, can capture them as densely connected subgraphs in a graph of sensory states connected by motor commands. Our approach is focused on elementary mechanisms, and is illustrated with a set of simple experiments in which an agent interacts with an environment. We show how the agent can build an internal model of moving but spatio-temporally invariant structures by performing a Spectral Clustering of the graph modeling its overall sensorimotor experiences. We systematically examine properties of the model, shedding light more globally on the specificities of the paradigm with respect to methods based on the supervised processing of collections of static images. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05057v1 |
http://arxiv.org/pdf/1810.05057v1.pdf | |
PWC | https://paperswithcode.com/paper/identification-of-invariant-sensorimotor |
Repo | |
Framework | |
Probability Calibration Trees
Title | Probability Calibration Trees |
Authors | Tim Leathart, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer |
Abstract | Obtaining accurate and well calibrated probability estimates from classifiers is useful in many applications, for example, when minimising the expected cost of classifications. Existing methods of calibrating probability estimates are applied globally, ignoring the potential for improvements by applying a more fine-grained model. We propose probability calibration trees, a modification of logistic model trees that identifies regions of the input space in which different probability calibration models are learned to improve performance. We compare probability calibration trees to two widely used calibration methods—isotonic regression and Platt scaling—and show that our method results in lower root mean squared error on average than both methods, for estimates produced by a variety of base learners. |
Tasks | Calibration |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1808.00111v2 |
http://arxiv.org/pdf/1808.00111v2.pdf | |
PWC | https://paperswithcode.com/paper/probability-calibration-trees |
Repo | |
Framework | |
Neural Character-based Composition Models for Abuse Detection
Title | Neural Character-based Composition Models for Abuse Detection |
Authors | Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova |
Abstract | The advent of social media in recent years has fed into some highly undesirable phenomena such as proliferation of offensive language, hate speech, sexist remarks, etc. on the Internet. In light of this, there have been several efforts to automate the detection and moderation of such abusive content. However, deliberate obfuscation of words by users to evade detection poses a serious challenge to the effectiveness of these efforts. The current state of the art approaches to abusive language detection, based on recurrent neural networks, do not explicitly address this problem and resort to a generic OOV (out of vocabulary) embedding for unseen words. However, in using a single embedding for all unseen words we lose the ability to distinguish between obfuscated and non-obfuscated or rare words. In this paper, we address this problem by designing a model that can compose embeddings for unseen words. We experimentally demonstrate that our approach significantly advances the current state of the art in abuse detection on datasets from two different domains, namely Twitter and Wikipedia talk page. |
Tasks | Abuse Detection |
Published | 2018-09-02 |
URL | http://arxiv.org/abs/1809.00378v1 |
http://arxiv.org/pdf/1809.00378v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-character-based-composition-models-for |
Repo | |
Framework | |
Where is this? Video geolocation based on neural network features
Title | Where is this? Video geolocation based on neural network features |
Authors | Salvador Medina, Zhuyun Dai, Yingkai Gao |
Abstract | In this work we propose a method that geolocates videos within a delimited widespread area based solely on the frames visual content. Our proposed method tackles video-geolocation through traditional image retrieval techniques considering Google Street View as the reference point. To achieve this goal we use the deep learning features obtained from NetVLAD to represent images, since through this feature vectors the similarity is their L2 norm. In this paper, we propose a family of voting-based methods to aggregate frame-wise geolocation results which boost the video geolocation result. The best aggregation found through our experiments considers both NetVLAD and SIFT similarity, as well as the geolocation density of the most similar results. To test our proposed method, we gathered a new video dataset from Pittsburgh Downtown area to benefit and stimulate more work in this area. Our system achieved a precision of 90% while geolocating videos within a range of 150 meters or two blocks away from the original position. |
Tasks | Image Retrieval |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09068v2 |
http://arxiv.org/pdf/1810.09068v2.pdf | |
PWC | https://paperswithcode.com/paper/where-is-this-video-geolocation-based-on |
Repo | |
Framework | |
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach
Title | SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach |
Authors | Michael Petrochuk, Luke Zettlemoyer |
Abstract | The SimpleQuestions dataset is one of the most commonly used benchmarks for studying single-relation factoid questions. In this paper, we present new evidence that this benchmark can be nearly solved by standard methods. First we show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the linguistic signal alone. Second we introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, despite using standard methods. Finally, we report an empirical analysis showing that the upperbound is loose; roughly a third of the remaining errors are also not resolvable from the linguistic signal. Together, these results suggest that the SimpleQuestions dataset is nearly solved. |
Tasks | |
Published | 2018-04-24 |
URL | http://arxiv.org/abs/1804.08798v1 |
http://arxiv.org/pdf/1804.08798v1.pdf | |
PWC | https://paperswithcode.com/paper/simplequestions-nearly-solved-a-new |
Repo | |
Framework | |
Ensembles of Nested Dichotomies with Multiple Subset Evaluation
Title | Ensembles of Nested Dichotomies with Multiple Subset Evaluation |
Authors | Tim Leathart, Eibe Frank, Bernhard Pfahringer, Geoffrey Holmes |
Abstract | A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively applies binary splits to divide the set of classes into two subsets, and trains a binary classifier for each split. Many methods have been proposed to perform this split, each with various advantages and disadvantages. In this paper, we present a simple, general method for improving the predictive performance of nested dichotomies produced by any subset selection techniques that employ randomness to construct the subsets. We provide a theoretical expectation for performance improvements, as well as empirical results showing that our method improves the root mean squared error of nested dichotomies, regardless of whether they are employed as an individual model or in an ensemble setting. |
Tasks | |
Published | 2018-09-08 |
URL | http://arxiv.org/abs/1809.02740v2 |
http://arxiv.org/pdf/1809.02740v2.pdf | |
PWC | https://paperswithcode.com/paper/ensembles-of-nested-dichotomies-with-multiple |
Repo | |
Framework | |