October 20, 2019

2932 words 14 mins read

Paper Group AWR 322

Paper Group AWR 322

COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval. Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks. Machine learning 2.0 : Engineering Data Driven AI Products. SegMap: 3D Segment Mapping using Data-Driven Descriptors. PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Reso …

COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval

Title COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
Authors Xirong Li, Chaoxi Xu, Xiaoxu Wang, Weiyu Lan, Zhengxiong Jia, Gang Yang, Jieping Xu
Abstract This paper contributes to cross-lingual image annotation and retrieval in terms of data and baseline methods. We propose COCO-CN, a novel dataset enriching MS-COCO with manually written Chinese sentences and tags. For more effective annotation acquisition, we develop a recommendation-assisted collective annotation system, automatically providing an annotator with several tags and sentences deemed to be relevant with respect to the pictorial content. Having 20,342 images annotated with 27,218 Chinese sentences and 70,993 tags, COCO-CN is currently the largest Chinese-English dataset that provides a unified and challenging platform for cross-lingual image tagging, captioning and retrieval. We develop conceptually simple yet effective methods per task for learning from cross-lingual resources. Extensive experiments on the three tasks justify the viability of the proposed dataset and methods. Data and code are publicly available at https://github.com/li-xirong/coco-cn
Tasks
Published 2018-05-22
URL http://arxiv.org/abs/1805.08661v2
PDF http://arxiv.org/pdf/1805.08661v2.pdf
PWC https://paperswithcode.com/paper/coco-cn-for-cross-lingual-image-tagging
Repo https://github.com/li-xirong/coco-cn
Framework none

Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks

Title Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Authors Zhibin Liao, Tom Drummond, Ian Reid, Gustavo Carneiro
Abstract In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.
Tasks Image Classification
Published 2018-10-16
URL http://arxiv.org/abs/1810.06767v1
PDF http://arxiv.org/pdf/1810.06767v1.pdf
PWC https://paperswithcode.com/paper/approximate-fisher-information-matrix-to
Repo https://github.com/zhibinliao89/fisher.info.mat.torch
Framework pytorch

Machine learning 2.0 : Engineering Data Driven AI Products

Title Machine learning 2.0 : Engineering Data Driven AI Products
Authors James Max Kanter, Benjamin Schreck, Kalyan Veeramachaneni
Abstract ML 2.0: In this paper, we propose a paradigm shift from the current practice of creating machine learning models - which requires months-long discovery, exploration and “feasibility report” generation, followed by re-engineering for deployment - in favor of a rapid, 8-week process of development, understanding, validation and deployment that can executed by developers or subject matter experts (non-ML experts) using reusable APIs. This accomplishes what we call a “minimum viable data-driven model,” delivering a ready-to-use machine learning model for problems that haven’t been solved before using machine learning. We provide provisions for the refinement and adaptation of the “model,” with strict enforcement and adherence to both the scaffolding/abstractions and the process. We imagine that this will bring forth the second phase in machine learning, in which discovery is subsumed by more targeted goals of delivery and impact.
Tasks
Published 2018-07-01
URL http://arxiv.org/abs/1807.00401v1
PDF http://arxiv.org/pdf/1807.00401v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-20-engineering-data-driven
Repo https://github.com/Featuretools/featuretools-docker
Framework none

SegMap: 3D Segment Mapping using Data-Driven Descriptors

Title SegMap: 3D Segment Mapping using Data-Driven Descriptors
Authors Renaud Dubé, Andrei Cramariuc, Daniel Dugas, Juan Nieto, Roland Siegwart, Cesar Cadena
Abstract When performing localization and mapping, working at the level of structure can be advantageous in terms of robustness to environmental changes and differences in illumination. This paper presents SegMap: a map representation solution to the localization and mapping problem based on the extraction of segments in 3D point clouds. In addition to facilitating the computationally intensive task of processing 3D point clouds, working at the level of segments addresses the data compression requirements of real-time single- and multi-robot systems. While current methods extract descriptors for the single task of localization, SegMap leverages a data-driven descriptor in order to extract meaningful features that can also be used for reconstructing a dense 3D map of the environment and for extracting semantic information. This is particularly interesting for navigation tasks and for providing visual feedback to end-users such as robot operators, for example in search and rescue scenarios. These capabilities are demonstrated in multiple urban driving and search and rescue experiments. Our method leads to an increase of area under the ROC curve of 28.3% over current state of the art using eigenvalue based features. We also obtain very similar reconstruction capabilities to a model specifically trained for this task. The SegMap implementation will be made available open-source along with easy to run demonstrations at www.github.com/ethz-asl/segmap. A video demonstration is available at https://youtu.be/CMk4w4eRobg.
Tasks
Published 2018-04-25
URL http://arxiv.org/abs/1804.09557v2
PDF http://arxiv.org/pdf/1804.09557v2.pdf
PWC https://paperswithcode.com/paper/segmap-3d-segment-mapping-using-data-driven
Repo https://github.com/ethz-asl/segmap
Framework tf

PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution

Title PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution
Authors Hong Chen, Zhenhua Fan, Hao Lu, Alan L. Yuille, Shu Rong
Abstract We introduce PreCo, a large-scale English dataset for coreference resolution. The dataset is designed to embody the core challenges in coreference, such as entity representation, by alleviating the challenge of low overlap between training and test sets and enabling separated analysis of mention detection and mention clustering. To strengthen the training-test overlap, we collect a large corpus of about 38K documents and 12.4M words which are mostly from the vocabulary of English-speaking preschoolers. Experiments show that with higher training-test overlap, error analysis on PreCo is more efficient than the one on OntoNotes, a popular existing dataset. Furthermore, we annotate singleton mentions making it possible for the first time to quantify the influence that a mention detector makes on coreference resolution performance. The dataset is freely available at https://preschool-lab.github.io/PreCo/.
Tasks Coreference Resolution
Published 2018-10-23
URL http://arxiv.org/abs/1810.09807v1
PDF http://arxiv.org/pdf/1810.09807v1.pdf
PWC https://paperswithcode.com/paper/preco-a-large-scale-dataset-in-preschool
Repo https://github.com/d5555/TagEditor
Framework none

Safe Exploration in Continuous Action Spaces

Title Safe Exploration in Continuous Action Spaces
Authors Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, Yuval Tassa
Abstract We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new representative physics-based environments, and prevail where reward shaping fails by maintaining zero constraint violations.
Tasks Safe Exploration
Published 2018-01-26
URL http://arxiv.org/abs/1801.08757v1
PDF http://arxiv.org/pdf/1801.08757v1.pdf
PWC https://paperswithcode.com/paper/safe-exploration-in-continuous-action-spaces
Repo https://github.com/AgrawalAmey/safe-explorer
Framework pytorch

Single Shot Active Learning using Pseudo Annotators

Title Single Shot Active Learning using Pseudo Annotators
Authors Yazhou Yang, Marco Loog
Abstract Standard myopic active learning assumes that human annotations are always obtainable whenever new samples are selected. This, however, is unrealistic in many real-world applications where human experts are not readily available at all times. In this paper, we consider the single shot setting: all the required samples should be chosen in a single shot and no human annotation can be exploited during the selection process. We propose a new method, Active Learning through Random Labeling (ALRL), which substitutes single human annotator for multiple, what we will refer to as, pseudo annotators. These pseudo annotators always provide uniform and random labels whenever new unlabeled samples are queried. This random labeling enables standard active learning algorithms to also exhibit the exploratory behavior needed for single shot active learning. The exploratory behavior is further enhanced by selecting the most representative sample via minimizing nearest neighbor distance between unlabeled samples and queried samples. Experiments on real-world datasets demonstrate that the proposed method outperforms several state-of-the-art approaches.
Tasks Active Learning
Published 2018-05-17
URL http://arxiv.org/abs/1805.06660v1
PDF http://arxiv.org/pdf/1805.06660v1.pdf
PWC https://paperswithcode.com/paper/single-shot-active-learning-using-pseudo
Repo https://github.com/YazhouTUD/single_shot_AL
Framework none

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

Title A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations
Authors Weili Nie, Yang Zhang, Ankit Patel
Abstract Backpropagation-based visualizations have been proposed to interpret convolutional neural networks (CNNs), however a theory is missing to justify their behaviors: Guided backpropagation (GBP) and deconvolutional network (DeconvNet) generate more human-interpretable but less class-sensitive visualizations than saliency map. Motivated by this, we develop a theoretical explanation revealing that GBP and DeconvNet are essentially doing (partial) image recovery which is unrelated to the network decisions. Specifically, our analysis shows that the backward ReLU introduced by GBP and DeconvNet, and the local connections in CNNs are the two main causes of compelling visualizations. Extensive experiments are provided that support the theoretical analysis.
Tasks
Published 2018-05-18
URL https://arxiv.org/abs/1805.07039v4
PDF https://arxiv.org/pdf/1805.07039v4.pdf
PWC https://paperswithcode.com/paper/a-theoretical-explanation-for-perplexing
Repo https://github.com/weilinie/BackpropVis
Framework tf

Compositional Obverter Communication Learning From Raw Visual Input

Title Compositional Obverter Communication Learning From Raw Visual Input
Authors Edward Choi, Angeliki Lazaridou, Nando de Freitas
Abstract One of the distinguishing aspects of human language is its compositionality, which allows us to describe complex environments with limited vocabulary. Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e.g. hand- engineered features). Humans, however, do not learn to communicate based on well-summarized features. In this work, we train neural agents to simultaneously develop visual perception from raw image pixels, and learn to communicate with a sequence of discrete symbols. The agents play an image description game where the image contains factors such as colors and shapes. We train the agents using the obverter technique where an agent introspects to generate messages that maximize its own understanding. Through qualitative analysis, visualization and a zero-shot test, we show that the agents can develop, out of raw image pixels, a language with compositional properties, given a proper pressure from the environment.
Tasks
Published 2018-04-06
URL http://arxiv.org/abs/1804.02341v1
PDF http://arxiv.org/pdf/1804.02341v1.pdf
PWC https://paperswithcode.com/paper/compositional-obverter-communication-learning
Repo https://github.com/benbogin/obverter
Framework pytorch

DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments

Title DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments
Authors Michele Alberti, Vinaychandran Pondenkandath, Marcel Würsch, Rolf Ingold, Marcus Liwicki
Abstract We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.
Tasks
Published 2018-04-23
URL http://arxiv.org/abs/1805.00329v1
PDF http://arxiv.org/pdf/1805.00329v1.pdf
PWC https://paperswithcode.com/paper/deepdiva-a-highly-functional-python-framework
Repo https://github.com/pr-tandomeijivan/Project-1
Framework pytorch

Ring loss: Convex Feature Normalization for Face Recognition

Title Ring loss: Convex Feature Normalization for Face Recognition
Authors Yutong Zheng, Dipan K. Pal, Marios Savvides
Abstract We motivate and present Ring loss, a simple and elegant feature normalization approach for deep networks designed to augment standard loss functions such as Softmax. We argue that deep feature normalization is an important aspect of supervised classification problems where we require the model to represent each class in a multi-class problem equally well. The direct approach to feature normalization through the hard normalization operation results in a non-convex formulation. Instead, Ring loss applies soft normalization, where it gradually learns to constrain the norm to the scaled unit circle while preserving convexity leading to more robust features. We apply Ring loss to large-scale face recognition problems and present results on LFW, the challenging protocols of IJB-A Janus, Janus CS3 (a superset of IJB-A Janus), Celebrity Frontal-Profile (CFP) and MegaFace with 1 million distractors. Ring loss outperforms strong baselines, matches state-of-the-art performance on IJB-A Janus and outperforms all other results on the challenging Janus CS3 thereby achieving state-of-the-art. We also outperform strong baselines in handling extremely low resolution face matching.
Tasks Face Identification, Face Recognition, Face Verification
Published 2018-02-28
URL http://arxiv.org/abs/1803.00130v1
PDF http://arxiv.org/pdf/1803.00130v1.pdf
PWC https://paperswithcode.com/paper/ring-loss-convex-feature-normalization-for
Repo https://github.com/Paralysis/ringloss
Framework pytorch

Feature Denoising for Improving Adversarial Robustness

Title Feature Denoising for Improving Adversarial Robustness
Authors Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He
Abstract Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 — it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ~10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.
Tasks Adversarial Defense, Image Classification
Published 2018-12-09
URL http://arxiv.org/abs/1812.03411v2
PDF http://arxiv.org/pdf/1812.03411v2.pdf
PWC https://paperswithcode.com/paper/feature-denoising-for-improving-adversarial
Repo https://github.com/facebookresearch/ImageNet-Adversarial-Training
Framework tf

A Simple Reservoir Model of Working Memory with Real Values

Title A Simple Reservoir Model of Working Memory with Real Values
Authors Anthony Strock, Nicolas Rougier, Xavier Hinaut
Abstract The prefrontal cortex is known to be involved in many high-level cognitive functions, in particular, working memory. Here, we study to what extent a group of randomly connected units (namely an Echo State Network, ESN) can store and maintain (as output) an arbitrary real value from a streamed input, i.e. can act as a sustained working memory unit. Furthermore, we explore to what extent such an architecture can take advantage of the stored value in order to produce non-linear computations. Comparison between different architectures (with and without feedback, with and without a working memory unit) shows that an explicit memory improves the performances.
Tasks
Published 2018-06-18
URL http://arxiv.org/abs/1806.06545v1
PDF http://arxiv.org/pdf/1806.06545v1.pdf
PWC https://paperswithcode.com/paper/a-simple-reservoir-model-of-working-memory
Repo https://github.com/anthony-strock/ijcnn2018
Framework none

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning

Title Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning
Authors Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Abstract We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their transcriptions) with a single OCR engine (OCRopus) produces a committee whose members then vote for the best outcome by also taking the top-N alternatives and their intrinsic confidence values into account. (3) Following the principle of maximal disagreement we select additional training lines which the voters disagree most on, expecting them to offer the highest information gain for a subsequent training (active learning). Evaluations on six early printed books yielded the following results: On average the combination of pretraining and voting improved the character accuracy by 46% when training five folds starting from the same mixed model. This number rose to 53% when using different models for pretraining, underlining the importance of diverse voters. Incorporating active learning improved the obtained results by another 16% on average (evaluated on three of the six books). Overall, the proposed methods lead to an average error rate of 2.5% when training on only 60 lines. Using a substantial ground truth pool of 1,000 lines brought the error rate down even further to less than 1% on average.
Tasks Active Learning, Optical Character Recognition
Published 2018-02-27
URL http://arxiv.org/abs/1802.10038v2
PDF http://arxiv.org/pdf/1802.10038v2.pdf
PWC https://paperswithcode.com/paper/improving-ocr-accuracy-on-early-printed-books
Repo https://github.com/chreul/mptv
Framework tf

UMONS Submission for WMT18 Multimodal Translation Task

Title UMONS Submission for WMT18 Multimodal Translation Task
Authors Jean-Benoit Delbrouck, Stéphane Dupont
Abstract This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18). We explore a novel architecture, called deepGRU, based on recent findings in the related task of Neural Image Captioning (NIC). The models presented in the following sections lead to the best METEOR translation score for both constrained (English, image) -> German and (English, image) -> French sub-tasks.
Tasks Image Captioning, Machine Translation, Multimodal Machine Translation
Published 2018-10-15
URL http://arxiv.org/abs/1810.06233v1
PDF http://arxiv.org/pdf/1810.06233v1.pdf
PWC https://paperswithcode.com/paper/umons-submission-for-wmt18-multimodal
Repo https://github.com/jbdel/WMT18_MNMT
Framework pytorch
comments powered by Disqus