October 20, 2019

3333 words 16 mins read

Paper Group AWR 238

Neural Architecture Search with Bayesian Optimisation and Optimal Transport. Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos. Machine learning for predicting thermal power consumption of the Mars Express Spacecraft. Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commo …

Neural Architecture Search with Bayesian Optimisation and Optimal Transport


Title	Neural Architecture Search with Bayesian Optimisation and Optimal Transport
Authors	Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, Eric Xing
Abstract	Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permits tuning scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network \emph{architectures}. In this work, we develop NASBOT, a Gaussian process based BO framework for neural architecture search. To accomplish this, we develop a distance metric in the space of neural network architectures which can be computed efficiently via an optimal transport program. This distance might be of independent interest to the deep learning community as it may find applications outside of BO. We demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks.
Tasks	Bayesian Optimisation, Model Selection, Neural Architecture Search
Published	2018-02-11
URL	http://arxiv.org/abs/1802.07191v3
PDF	http://arxiv.org/pdf/1802.07191v3.pdf
PWC	https://paperswithcode.com/paper/neural-architecture-search-with-bayesian
Repo	https://github.com/kirthevasank/nasbot
Framework	tf


Title	Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos
Authors	Xingyu Chen, Junzhi Yu, Shihan Kong, Zhengxing Wu, Li Wen
Abstract	Object detection has been vigorously investigated for years but fast accurate detection for real-world scenes remains a very challenging problem. Overcoming drawbacks of single-stage detectors, we take aim at precisely detecting objects for static and temporal scenes in real time. Firstly, as a dual refinement mechanism, a novel anchor-offset detection is designed, which includes an anchor refinement, a feature location refinement, and a deformable detection head. This new detection mode is able to simultaneously perform two-step regression and capture accurate object features. Based on the anchor-offset detection, a dual refinement network (DRNet) is developed for high-performance static detection, where a multi-deformable head is further designed to leverage contextual information for describing objects. As for temporal detection in videos, temporal refinement networks (TRNet) and temporal dual refinement networks (TDRNet) are developed by propagating the refinement information across time. We also propose a soft refinement strategy to temporally match object motion with the previous refinement. Our proposed methods are evaluated on PASCAL VOC, COCO, and ImageNet VID datasets. Extensive comparisons on static and temporal detection verify the superiority of DRNet, TRNet, and TDRNet. Consequently, our developed approaches run in a fairly fast speed, and in the meantime achieve a significantly enhanced detection accuracy, i.e., 84.4% mAP on VOC 2007, 83.6% mAP on VOC 2012, 69.4% mAP on VID 2017, and 42.4% AP on COCO. Ultimately, producing encouraging results, our methods are applied to online underwater object detection and grasping with an autonomous system. Codes are publicly available at https://github.com/SeanChenxy/TDRN.
Tasks	Object Detection, Real-Time Object Detection
Published	2018-07-23
URL	https://arxiv.org/abs/1807.08638v6
PDF	https://arxiv.org/pdf/1807.08638v6.pdf
PWC	https://paperswithcode.com/paper/towards-real-time-accurate-object-detection
Repo	https://github.com/SeanChenxy/TDRN
Framework	pytorch

Machine learning for predicting thermal power consumption of the Mars Express Spacecraft


Title	Machine learning for predicting thermal power consumption of the Mars Express Spacecraft
Authors	Matej Petković, Redouane Boumghar, Martin Breskvar, Sašo Džeroski, Dragi Kocev, Jurica Levatić, Luke Lucas, Aljaž Osojnik, Bernard Ženko, Nikola Simidjievski
Abstract	The thermal subsystem of the Mars Express (MEX) spacecraft keeps the on-board equipment within its pre-defined operating temperatures range. To plan and optimize the scientific operations of MEX, its operators need to estimate in advance, as accurately as possible, the power consumption of the thermal subsystem. The remaining power can then be allocated for scientific purposes. We present a machine learning pipeline for efficiently constructing accurate predictive models for predicting the power of the thermal subsystem on board MEX. In particular, we employ state-of-the-art feature engineering approaches for transforming raw telemetry data, in turn used for constructing accurate models with different state-of-the-art machine learning methods. We show that the proposed pipeline considerably improve our previous (competition-winning) work in terms of time efficiency and predictive performance. Moreover, while achieving superior predictive performance, the constructed models also provide important insight into the spacecraft’s behavior, allowing for further analyses and optimal planning of MEX’s operation.
Tasks	Feature Engineering
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00542v2
PDF	http://arxiv.org/pdf/1809.00542v2.pdf
PWC	https://paperswithcode.com/paper/machine-learning-for-predicting-thermal-power
Repo	https://github.com/shinjjo/MarsExpress
Framework	none

Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension


Title	Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension
Authors	Liang Wang, Meng Sun, Wei Zhao, Kewei Shen, Jingming Liu
Abstract	This paper describes our system for SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. We use Three-way Attentive Networks (TriAN) to model interactions between the passage, question and answers. To incorporate commonsense knowledge, we augment the input with relation embedding from the graph of general knowledge ConceptNet (Speer et al., 2017). As a result, our system achieves state-of-the-art performance with 83.95% accuracy on the official test data. Code is publicly available at https://github.com/intfloat/commonsense-rc
Tasks	Reading Comprehension
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00191v5
PDF	http://arxiv.org/pdf/1803.00191v5.pdf
PWC	https://paperswithcode.com/paper/yuanfudao-at-semeval-2018-task-11-three-way
Repo	https://github.com/akamath11/Multiple-choice-comprehension
Framework	pytorch

Anonymous Walk Embeddings


Title	Anonymous Walk Embeddings
Authors	Sergey Ivanov, Evgeny Burnaev
Abstract	The task of representing entire graphs has seen a surge of prominent results, mainly due to learning convolutional neural networks (CNNs) on graph-structured data. While CNNs demonstrate state-of-the-art performance in graph classification task, such methods are supervised and therefore steer away from the original problem of network representation in task-agnostic manner. Here, we coherently propose an approach for embedding entire graphs and show that our feature representations with SVM classifier increase classification accuracy of CNN algorithms and traditional graph kernels. For this we describe a recently discovered graph object, anonymous walk, on which we design task-independent algorithms for learning graph representations in explicit and distributed way. Overall, our work represents a new scalable unsupervised learning of state-of-the-art representations of entire graphs.
Tasks	Graph Classification
Published	2018-05-30
URL	http://arxiv.org/abs/1805.11921v3
PDF	http://arxiv.org/pdf/1805.11921v3.pdf
PWC	https://paperswithcode.com/paper/anonymous-walk-embeddings
Repo	https://github.com/nd7141/AWE
Framework	tf

Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis


Title	Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis
Authors	Yishai Shimoni, Chen Yanover, Ehud Karavani, Yaara Goldschmnidt
Abstract	Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a patient’s outcome. Compared to classic machine learning methods, evaluation and validation of causal inference analysis is more challenging because ground truth data of counter-factual outcome can never be obtained in any real-world scenario. Here, we present a comprehensive framework for benchmarking algorithms that estimate causal effect. The framework includes unlabeled data for prediction, labeled data for validation, and code for automatic evaluation of algorithm predictions using both established and novel metrics. The data is based on real-world covariates, and the treatment assignments and outcomes are based on simulations, which provides the basis for validation. In this framework we address two questions: one of scaling, and the other of data-censoring. The framework is available as open source code at https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework
Tasks	Causal Inference
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05046v2
PDF	http://arxiv.org/pdf/1802.05046v2.pdf
PWC	https://paperswithcode.com/paper/benchmarking-framework-for-performance
Repo	https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework
Framework	none

Dense and Diverse Capsule Networks: Making the Capsules Learn Better


Title	Dense and Diverse Capsule Networks: Making the Capsules Learn Better
Authors	Sai Samarth R Phaye, Apoorva Sikka, Abhinav Dhall, Deepti Bathula
Abstract	Past few years have witnessed exponential growth of interest in deep learning methodologies with rapidly improving accuracies and reduced computational complexity. In particular, architectures using Convolutional Neural Networks (CNNs) have produced state-of-the-art performances for image classification and object recognition tasks. Recently, Capsule Networks (CapsNet) achieved significant increase in performance by addressing an inherent limitation of CNNs in encoding pose and deformation. Inspired by such advancement, we asked ourselves, can we do better? We propose Dense Capsule Networks (DCNet) and Diverse Capsule Networks (DCNet++). The two proposed frameworks customize the CapsNet by replacing the standard convolutional layers with densely connected convolutions. This helps in incorporating feature maps learned by different layers in forming the primary capsules. DCNet, essentially adds a deeper convolution network, which leads to learning of discriminative feature maps. Additionally, DCNet++ uses a hierarchical architecture to learn capsules that represent spatial information in a fine-to-coarser manner, which makes it more efficient for learning complex data. Experiments on image classification task using benchmark datasets demonstrate the efficacy of the proposed architectures. DCNet achieves state-of-the-art performance (99.75%) on MNIST dataset with twenty fold decrease in total training iterations, over the conventional CapsNet. Furthermore, DCNet++ performs better than CapsNet on SVHN dataset (96.90%), and outperforms the ensemble of seven CapsNet models on CIFAR-10 by 0.31% with seven fold decrease in number of parameters.
Tasks	Image Classification, Object Recognition
Published	2018-05-10
URL	http://arxiv.org/abs/1805.04001v1
PDF	http://arxiv.org/pdf/1805.04001v1.pdf
PWC	https://paperswithcode.com/paper/dense-and-diverse-capsule-networks-making-the
Repo	https://github.com/ssrp/Multi-level-DCNet
Framework	tf

Auditing Data Provenance in Text-Generation Models


Title	Auditing Data Provenance in Text-Generation Models
Authors	Congzheng Song, Vitaly Shmatikov
Abstract	To help enforce data-protection regulations such as GDPR and detect unauthorized uses of personal data, we develop a new \emph{model auditing} technique that helps users check if their data was used to train a machine learning model. We focus on auditing deep-learning models that generate natural-language text, including word prediction and dialog generation. These models are at the core of popular online services and are often trained on personal data such as users’ messages, searches, chats, and comments. We design and evaluate a black-box auditing method that can detect, with very few queries to a model, if a particular user’s texts were used to train it (among thousands of other users). We empirically show that our method can successfully audit well-generalized models that are not overfitted to the training data. We also analyze how text-generation models memorize word sequences and explain why this memorization makes them amenable to auditing.
Tasks	Text Generation
Published	2018-11-01
URL	https://arxiv.org/abs/1811.00513v2
PDF	https://arxiv.org/pdf/1811.00513v2.pdf
PWC	https://paperswithcode.com/paper/the-natural-auditor-how-to-tell-if-someone
Repo	https://github.com/csong27/auditing-text-generation
Framework	none

Deep Metric Learning by Online Soft Mining and Class-Aware Attention


Title	Deep Metric Learning by Online Soft Mining and Class-Aware Attention
Authors	Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, Neil M. Robertson
Abstract	Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample mining methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.
Tasks	Metric Learning, Person Re-Identification, Semantic Similarity, Semantic Textual Similarity, Video-Based Person Re-Identification
Published	2018-11-04
URL	https://arxiv.org/abs/1811.01459v3
PDF	https://arxiv.org/pdf/1811.01459v3.pdf
PWC	https://paperswithcode.com/paper/deep-metric-learning-by-online-soft-mining
Repo	https://github.com/XinshaoAmosWang/OSM_CAA_WeightedContrastiveLoss
Framework	none

On Optimally Partitioning Variable-Byte Codes


Title	On Optimally Partitioning Variable-Byte Codes
Authors	Giulio Ermanno Pibiri, Rossano Venturini
Abstract	The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to be compressed are small that is the typical case for inverted indexes. This paper shows that the compression ratio of Variable-Byte can be improved by 2x by adopting a partitioned representation of the inverted lists. This makes Variable-Byte surprisingly competitive in space with the best bit-aligned encoders, hence disproving the folklore belief that Variable-Byte is space-inefficient for inverted index compression. Despite the significant space savings, we show that our optimization almost comes for free, given that: we introduce an optimal partitioning algorithm that does not affect indexing time because of its linear-time complexity; we show that the query processing speed of Variable-Byte is preserved, with an extensive experimental analysis and comparison with several other state-of-the-art encoders.
Tasks
Published	2018-04-29
URL	https://arxiv.org/abs/1804.10949v2
PDF	https://arxiv.org/pdf/1804.10949v2.pdf
PWC	https://paperswithcode.com/paper/on-optimally-partitioning-variable-byte-codes
Repo	https://github.com/jermp/opt_vbyte
Framework	none

Git Loss for Deep Face Recognition


Title	Git Loss for Deep Face Recognition
Authors	Alessandro Calefati, Muhammad Kamran Janjua, Shah Nawaz, Ignazio Gallo
Abstract	Convolutional Neural Networks (CNNs) have been widely used in computer vision tasks, such as face recognition and verification, and have achieved state-of-the-art results due to their ability to capture discriminative deep features. Conventionally, CNNs have been trained with softmax as supervision signal to penalize the classification loss. In order to further enhance the discriminative capability of deep features, we introduce a joint supervision signal, Git loss, which leverages on softmax and center loss functions. The aim of our loss function is to minimize the intra-class variations as well as maximize the inter-class distances. Such minimization and maximization of deep features are considered ideal for face recognition task. We perform experiments on two popular face recognition benchmarks datasets and show that our proposed loss function achieves maximum separability between deep face features of different identities and achieves state-of-the-art accuracy on two major face recognition benchmark datasets: Labeled Faces in the Wild (LFW) and YouTube Faces (YTF). However, it should be noted that the major objective of Git loss is to achieve maximum separability between deep features of divergent identities.
Tasks	Face Identification, Face Recognition, Face Verification
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08512v4
PDF	http://arxiv.org/pdf/1807.08512v4.pdf
PWC	https://paperswithcode.com/paper/git-loss-for-deep-face-recognition
Repo	https://github.com/kjanjua26/Git-Loss-For-Deep-Face-Recognition
Framework	tf

Object Detection from Scratch with Deep Supervision


Title	Object Detection from Scratch with Deep Supervision
Authors	Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, Xiangyang Xue
Abstract	We propose Deeply Supervised Object Detectors (DSOD), an object detection framework that can be trained from scratch. Recent advances in object detection heavily depend on the off-the-shelf models pre-trained on large-scale classification datasets like ImageNet and OpenImage. However, one problem is that adopting pre-trained models from classification to detection task may incur learning bias due to the different objective function and diverse distributions of object categories. Techniques like fine-tuning on detection task could alleviate this issue to some extent but are still not fundamental. Furthermore, transferring these pre-trained models across discrepant domains will be more difficult (e.g., from RGB to depth images). Thus, a better solution to handle these critical problems is to train object detectors from scratch, which motivates our proposed method. Previous efforts on this direction mainly failed by reasons of the limited training data and naive backbone network structures for object detection. In DSOD, we contribute a set of design principles for learning object detectors from scratch. One of the key principles is the deep supervision, enabled by layer-wise dense connections in both backbone networks and prediction layers, plays a critical role in learning good detectors from scratch. After involving several other principles, we build our DSOD based on the single-shot detection framework (SSD). We evaluate our method on PASCAL VOC 2007, 2012 and COCO datasets. DSOD achieves consistently better results than the state-of-the-art methods with much more compact models. Specifically, DSOD outperforms baseline method SSD on all three benchmarks, while requiring only 1/2 parameters. We also observe that DSOD can achieve comparable/slightly better results than Mask RCNN + FPN (under similar input size) with only 1/3 parameters, using no extra data or pre-trained models.
Tasks	Object Detection
Published	2018-09-25
URL	http://arxiv.org/abs/1809.09294v2
PDF	http://arxiv.org/pdf/1809.09294v2.pdf
PWC	https://paperswithcode.com/paper/object-detection-from-scratch-with-deep
Repo	https://github.com/szq0214/DSOD
Framework	caffe2

ArcFace: Additive Angular Margin Loss for Deep Face Recognition


Title	ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Authors	Jiankang Deng, Jia Guo, Niannan Xue, Stefanos Zafeiriou
Abstract	One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that enhance discriminative power. Centre loss penalises the distance between the deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in an angular space and penalises the angles between the deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to the exact correspondence to the geodesic distance on the hypersphere. We present arguably the most extensive experimental evaluation of all the recent state-of-the-art face recognition methods on over 10 face recognition benchmarks including a new large-scale image database with trillion level of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state-of-the-art and can be easily implemented with negligible computational overhead. We release all refined training data, training codes, pre-trained models and training logs, which will help reproduce the results in this paper.
Tasks	Face Identification, Face Recognition, Face Verification
Published	2018-01-23
URL	http://arxiv.org/abs/1801.07698v3
PDF	http://arxiv.org/pdf/1801.07698v3.pdf
PWC	https://paperswithcode.com/paper/arcface-additive-angular-margin-loss-for-deep
Repo	https://github.com/chenggongliang/arcface
Framework	mxnet

A Transition-based Algorithm for Unrestricted AMR Parsing


Title	A Transition-based Algorithm for Unrestricted AMR Parsing
Authors	David Vilares, Carlos Gómez-Rodríguez
Abstract	Non-projective parsing can be useful to handle cycles and reentrancy in AMR graphs. We explore this idea and introduce a greedy left-to-right non-projective transition-based parser. At each parsing configuration, an oracle decides whether to create a concept or whether to connect a pair of existing concepts. The algorithm handles reentrancy and arbitrary cycles natively, i.e. within the transition system itself. The model is evaluated on the LDC2015E86 corpus, obtaining results close to the state of the art, including a Smatch of 64%, and showing good behavior on reentrant edges.
Tasks	Amr Parsing
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09007v1
PDF	http://arxiv.org/pdf/1805.09007v1.pdf
PWC	https://paperswithcode.com/paper/a-transition-based-algorithm-for-unrestricted
Repo	https://github.com/aghie/tb-amr
Framework	tf

Data Consistency Approach to Model Validation


Title	Data Consistency Approach to Model Validation
Authors	Andreas Svensson, Dave Zachariah, Petre Stoica, Thomas B. Schön
Abstract	In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automatically gauging the models’ ability to generate data that is similar to the observed data. Importantly, the criterion follows from the model class itself and is therefore directly applicable to a broad range of inference problems with varying data types, ranging from independent univariate data to high-dimensional time-series. The proposed data consistency criterion is illustrated, evaluated and compared to several well-established methods using three synthetic and two real data sets.
Tasks	Time Series
Published	2018-08-17
URL	https://arxiv.org/abs/1808.05889v2
PDF	https://arxiv.org/pdf/1808.05889v2.pdf
PWC	https://paperswithcode.com/paper/data-consistency-approach-to-model-validation
Repo	https://github.com/saerdna-se/consistency-criterion
Framework	none