October 20, 2019

2932 words 14 mins read

Paper Group AWR 322

COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval. Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks. Machine learning 2.0 : Engineering Data Driven AI Products. SegMap: 3D Segment Mapping using Data-Driven Descriptors. PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Reso …

COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval


Title	COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
Authors	Xirong Li, Chaoxi Xu, Xiaoxu Wang, Weiyu Lan, Zhengxiong Jia, Gang Yang, Jieping Xu
Abstract	This paper contributes to cross-lingual image annotation and retrieval in terms of data and baseline methods. We propose COCO-CN, a novel dataset enriching MS-COCO with manually written Chinese sentences and tags. For more effective annotation acquisition, we develop a recommendation-assisted collective annotation system, automatically providing an annotator with several tags and sentences deemed to be relevant with respect to the pictorial content. Having 20,342 images annotated with 27,218 Chinese sentences and 70,993 tags, COCO-CN is currently the largest Chinese-English dataset that provides a unified and challenging platform for cross-lingual image tagging, captioning and retrieval. We develop conceptually simple yet effective methods per task for learning from cross-lingual resources. Extensive experiments on the three tasks justify the viability of the proposed dataset and methods. Data and code are publicly available at https://github.com/li-xirong/coco-cn
Tasks
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08661v2
PDF	http://arxiv.org/pdf/1805.08661v2.pdf
PWC	https://paperswithcode.com/paper/coco-cn-for-cross-lingual-image-tagging
Repo	https://github.com/li-xirong/coco-cn
Framework	none

Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks


Title	Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Authors	Zhibin Liao, Tom Drummond, Ian Reid, Gustavo Carneiro
Abstract	In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.
Tasks	Image Classification
Published	2018-10-16
URL	http://arxiv.org/abs/1810.06767v1
PDF	http://arxiv.org/pdf/1810.06767v1.pdf
PWC	https://paperswithcode.com/paper/approximate-fisher-information-matrix-to
Repo	https://github.com/zhibinliao89/fisher.info.mat.torch
Framework	pytorch

Machine learning 2.0 : Engineering Data Driven AI Products


Title	Machine learning 2.0 : Engineering Data Driven AI Products
Authors	James Max Kanter, Benjamin Schreck, Kalyan Veeramachaneni
Abstract	ML 2.0: In this paper, we propose a paradigm shift from the current practice of creating machine learning models - which requires months-long discovery, exploration and “feasibility report” generation, followed by re-engineering for deployment - in favor of a rapid, 8-week process of development, understanding, validation and deployment that can executed by developers or subject matter experts (non-ML experts) using reusable APIs. This accomplishes what we call a “minimum viable data-driven model,” delivering a ready-to-use machine learning model for problems that haven’t been solved before using machine learning. We provide provisions for the refinement and adaptation of the “model,” with strict enforcement and adherence to both the scaffolding/abstractions and the process. We imagine that this will bring forth the second phase in machine learning, in which discovery is subsumed by more targeted goals of delivery and impact.
Tasks
Published	2018-07-01
URL	http://arxiv.org/abs/1807.00401v1
PDF	http://arxiv.org/pdf/1807.00401v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-20-engineering-data-driven
Repo	https://github.com/Featuretools/featuretools-docker
Framework	none

SegMap: 3D Segment Mapping using Data-Driven Descriptors


Title	SegMap: 3D Segment Mapping using Data-Driven Descriptors
Authors	Renaud Dubé, Andrei Cramariuc, Daniel Dugas, Juan Nieto, Roland Siegwart, Cesar Cadena
Abstract	When performing localization and mapping, working at the level of structure can be advantageous in terms of robustness to environmental changes and differences in illumination. This paper presents SegMap: a map representation solution to the localization and mapping problem based on the extraction of segments in 3D point clouds. In addition to facilitating the computationally intensive task of processing 3D point clouds, working at the level of segments addresses the data compression requirements of real-time single- and multi-robot systems. While current methods extract descriptors for the single task of localization, SegMap leverages a data-driven descriptor in order to extract meaningful features that can also be used for reconstructing a dense 3D map of the environment and for extracting semantic information. This is particularly interesting for navigation tasks and for providing visual feedback to end-users such as robot operators, for example in search and rescue scenarios. These capabilities are demonstrated in multiple urban driving and search and rescue experiments. Our method leads to an increase of area under the ROC curve of 28.3% over current state of the art using eigenvalue based features. We also obtain very similar reconstruction capabilities to a model specifically trained for this task. The SegMap implementation will be made available open-source along with easy to run demonstrations at www.github.com/ethz-asl/segmap. A video demonstration is available at https://youtu.be/CMk4w4eRobg.
Tasks
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09557v2
PDF	http://arxiv.org/pdf/1804.09557v2.pdf
PWC	https://paperswithcode.com/paper/segmap-3d-segment-mapping-using-data-driven
Repo	https://github.com/ethz-asl/segmap
Framework	tf

PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution


Title	PreCo: A Large-scale Dataset in Preschool Vocabulary for Coreference Resolution
Authors	Hong Chen, Zhenhua Fan, Hao Lu, Alan L. Yuille, Shu Rong
Abstract	We introduce PreCo, a large-scale English dataset for coreference resolution. The dataset is designed to embody the core challenges in coreference, such as entity representation, by alleviating the challenge of low overlap between training and test sets and enabling separated analysis of mention detection and mention clustering. To strengthen the training-test overlap, we collect a large corpus of about 38K documents and 12.4M words which are mostly from the vocabulary of English-speaking preschoolers. Experiments show that with higher training-test overlap, error analysis on PreCo is more efficient than the one on OntoNotes, a popular existing dataset. Furthermore, we annotate singleton mentions making it possible for the first time to quantify the influence that a mention detector makes on coreference resolution performance. The dataset is freely available at https://preschool-lab.github.io/PreCo/.
Tasks	Coreference Resolution
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09807v1
PDF	http://arxiv.org/pdf/1810.09807v1.pdf
PWC	https://paperswithcode.com/paper/preco-a-large-scale-dataset-in-preschool
Repo	https://github.com/d5555/TagEditor
Framework	none

Safe Exploration in Continuous Action Spaces


Title	Safe Exploration in Continuous Action Spaces
Authors	Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru, Yuval Tassa
Abstract	We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new representative physics-based environments, and prevail where reward shaping fails by maintaining zero constraint violations.
Tasks	Safe Exploration
Published	2018-01-26
URL	http://arxiv.org/abs/1801.08757v1
PDF	http://arxiv.org/pdf/1801.08757v1.pdf
PWC	https://paperswithcode.com/paper/safe-exploration-in-continuous-action-spaces
Repo	https://github.com/AgrawalAmey/safe-explorer
Framework	pytorch

Single Shot Active Learning using Pseudo Annotators


Title	Single Shot Active Learning using Pseudo Annotators
Authors	Yazhou Yang, Marco Loog
Abstract	Standard myopic active learning assumes that human annotations are always obtainable whenever new samples are selected. This, however, is unrealistic in many real-world applications where human experts are not readily available at all times. In this paper, we consider the single shot setting: all the required samples should be chosen in a single shot and no human annotation can be exploited during the selection process. We propose a new method, Active Learning through Random Labeling (ALRL), which substitutes single human annotator for multiple, what we will refer to as, pseudo annotators. These pseudo annotators always provide uniform and random labels whenever new unlabeled samples are queried. This random labeling enables standard active learning algorithms to also exhibit the exploratory behavior needed for single shot active learning. The exploratory behavior is further enhanced by selecting the most representative sample via minimizing nearest neighbor distance between unlabeled samples and queried samples. Experiments on real-world datasets demonstrate that the proposed method outperforms several state-of-the-art approaches.
Tasks	Active Learning
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06660v1
PDF	http://arxiv.org/pdf/1805.06660v1.pdf
PWC	https://paperswithcode.com/paper/single-shot-active-learning-using-pseudo
Repo	https://github.com/YazhouTUD/single_shot_AL
Framework	none

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations


Title	A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations
Authors	Weili Nie, Yang Zhang, Ankit Patel
Abstract	Backpropagation-based visualizations have been proposed to interpret convolutional neural networks (CNNs), however a theory is missing to justify their behaviors: Guided backpropagation (GBP) and deconvolutional network (DeconvNet) generate more human-interpretable but less class-sensitive visualizations than saliency map. Motivated by this, we develop a theoretical explanation revealing that GBP and DeconvNet are essentially doing (partial) image recovery which is unrelated to the network decisions. Specifically, our analysis shows that the backward ReLU introduced by GBP and DeconvNet, and the local connections in CNNs are the two main causes of compelling visualizations. Extensive experiments are provided that support the theoretical analysis.
Tasks
Published	2018-05-18
URL	https://arxiv.org/abs/1805.07039v4
PDF	https://arxiv.org/pdf/1805.07039v4.pdf
PWC	https://paperswithcode.com/paper/a-theoretical-explanation-for-perplexing
Repo	https://github.com/weilinie/BackpropVis
Framework	tf

Compositional Obverter Communication Learning From Raw Visual Input


Title	Compositional Obverter Communication Learning From Raw Visual Input
Authors	Edward Choi, Angeliki Lazaridou, Nando de Freitas
Abstract	One of the distinguishing aspects of human language is its compositionality, which allows us to describe complex environments with limited vocabulary. Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e.g. hand- engineered features). Humans, however, do not learn to communicate based on well-summarized features. In this work, we train neural agents to simultaneously develop visual perception from raw image pixels, and learn to communicate with a sequence of discrete symbols. The agents play an image description game where the image contains factors such as colors and shapes. We train the agents using the obverter technique where an agent introspects to generate messages that maximize its own understanding. Through qualitative analysis, visualization and a zero-shot test, we show that the agents can develop, out of raw image pixels, a language with compositional properties, given a proper pressure from the environment.
Tasks
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02341v1
PDF	http://arxiv.org/pdf/1804.02341v1.pdf
PWC	https://paperswithcode.com/paper/compositional-obverter-communication-learning
Repo	https://github.com/benbogin/obverter
Framework	pytorch

DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments


Title	DeepDIVA: A Highly-Functional Python Framework for Reproducible Experiments
Authors	Michele Alberti, Vinaychandran Pondenkandath, Marcel Würsch, Rolf Ingold, Marcus Liwicki
Abstract	We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source, and accessible as Web Service through DIVAServices.
Tasks
Published	2018-04-23
URL	http://arxiv.org/abs/1805.00329v1
PDF	http://arxiv.org/pdf/1805.00329v1.pdf
PWC	https://paperswithcode.com/paper/deepdiva-a-highly-functional-python-framework
Repo	https://github.com/pr-tandomeijivan/Project-1
Framework	pytorch

Ring loss: Convex Feature Normalization for Face Recognition


Title	Ring loss: Convex Feature Normalization for Face Recognition
Authors	Yutong Zheng, Dipan K. Pal, Marios Savvides
Abstract	We motivate and present Ring loss, a simple and elegant feature normalization approach for deep networks designed to augment standard loss functions such as Softmax. We argue that deep feature normalization is an important aspect of supervised classification problems where we require the model to represent each class in a multi-class problem equally well. The direct approach to feature normalization through the hard normalization operation results in a non-convex formulation. Instead, Ring loss applies soft normalization, where it gradually learns to constrain the norm to the scaled unit circle while preserving convexity leading to more robust features. We apply Ring loss to large-scale face recognition problems and present results on LFW, the challenging protocols of IJB-A Janus, Janus CS3 (a superset of IJB-A Janus), Celebrity Frontal-Profile (CFP) and MegaFace with 1 million distractors. Ring loss outperforms strong baselines, matches state-of-the-art performance on IJB-A Janus and outperforms all other results on the challenging Janus CS3 thereby achieving state-of-the-art. We also outperform strong baselines in handling extremely low resolution face matching.
Tasks	Face Identification, Face Recognition, Face Verification
Published	2018-02-28
URL	http://arxiv.org/abs/1803.00130v1
PDF	http://arxiv.org/pdf/1803.00130v1.pdf
PWC	https://paperswithcode.com/paper/ring-loss-convex-feature-normalization-for
Repo	https://github.com/Paralysis/ringloss
Framework	pytorch

Feature Denoising for Improving Adversarial Robustness


Title	Feature Denoising for Improving Adversarial Robustness
Authors	Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He
Abstract	Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these networks. Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. Specifically, our networks contain blocks that denoise the features using non-local means or other filters; the entire networks are trained end-to-end. When combined with adversarial training, our feature denoising networks substantially improve the state-of-the-art in adversarial robustness in both white-box and black-box attack settings. On ImageNet, under 10-iteration PGD white-box attacks where prior art has 27.9% accuracy, our method achieves 55.7%; even under extreme 2000-iteration PGD white-box attacks, our method secures 42.6% accuracy. Our method was ranked first in Competition on Adversarial Attacks and Defenses (CAAD) 2018 — it achieved 50.6% classification accuracy on a secret, ImageNet-like test dataset against 48 unknown attackers, surpassing the runner-up approach by ~10%. Code is available at https://github.com/facebookresearch/ImageNet-Adversarial-Training.
Tasks	Adversarial Defense, Image Classification
Published	2018-12-09
URL	http://arxiv.org/abs/1812.03411v2
PDF	http://arxiv.org/pdf/1812.03411v2.pdf
PWC	https://paperswithcode.com/paper/feature-denoising-for-improving-adversarial
Repo	https://github.com/facebookresearch/ImageNet-Adversarial-Training
Framework	tf

A Simple Reservoir Model of Working Memory with Real Values


Title	A Simple Reservoir Model of Working Memory with Real Values
Authors	Anthony Strock, Nicolas Rougier, Xavier Hinaut
Abstract	The prefrontal cortex is known to be involved in many high-level cognitive functions, in particular, working memory. Here, we study to what extent a group of randomly connected units (namely an Echo State Network, ESN) can store and maintain (as output) an arbitrary real value from a streamed input, i.e. can act as a sustained working memory unit. Furthermore, we explore to what extent such an architecture can take advantage of the stored value in order to produce non-linear computations. Comparison between different architectures (with and without feedback, with and without a working memory unit) shows that an explicit memory improves the performances.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06545v1
PDF	http://arxiv.org/pdf/1806.06545v1.pdf
PWC	https://paperswithcode.com/paper/a-simple-reservoir-model-of-working-memory
Repo	https://github.com/anthony-strock/ijcnn2018
Framework	none

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning


Title	Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning
Authors	Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Abstract	We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their transcriptions) with a single OCR engine (OCRopus) produces a committee whose members then vote for the best outcome by also taking the top-N alternatives and their intrinsic confidence values into account. (3) Following the principle of maximal disagreement we select additional training lines which the voters disagree most on, expecting them to offer the highest information gain for a subsequent training (active learning). Evaluations on six early printed books yielded the following results: On average the combination of pretraining and voting improved the character accuracy by 46% when training five folds starting from the same mixed model. This number rose to 53% when using different models for pretraining, underlining the importance of diverse voters. Incorporating active learning improved the obtained results by another 16% on average (evaluated on three of the six books). Overall, the proposed methods lead to an average error rate of 2.5% when training on only 60 lines. Using a substantial ground truth pool of 1,000 lines brought the error rate down even further to less than 1% on average.
Tasks	Active Learning, Optical Character Recognition
Published	2018-02-27
URL	http://arxiv.org/abs/1802.10038v2
PDF	http://arxiv.org/pdf/1802.10038v2.pdf
PWC	https://paperswithcode.com/paper/improving-ocr-accuracy-on-early-printed-books
Repo	https://github.com/chreul/mptv
Framework	tf

UMONS Submission for WMT18 Multimodal Translation Task


Title	UMONS Submission for WMT18 Multimodal Translation Task
Authors	Jean-Benoit Delbrouck, Stéphane Dupont
Abstract	This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18). We explore a novel architecture, called deepGRU, based on recent findings in the related task of Neural Image Captioning (NIC). The models presented in the following sections lead to the best METEOR translation score for both constrained (English, image) -> German and (English, image) -> French sub-tasks.
Tasks	Image Captioning, Machine Translation, Multimodal Machine Translation
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06233v1
PDF	http://arxiv.org/pdf/1810.06233v1.pdf
PWC	https://paperswithcode.com/paper/umons-submission-for-wmt18-multimodal
Repo	https://github.com/jbdel/WMT18_MNMT
Framework	pytorch