April 1, 2020

3163 words 15 mins read

Paper Group ANR 515

BiDet: An Efficient Binarized Object Detector. Structure-Property Maps with Kernel Principal Covariates Regression. Eliminating Search Intent Bias in Learning to Rank. Felix: Flexible Text Editing Through Tagging and Insertion. Cryptanalytic Extraction of Neural Network Models. Video Caption Dataset for Describing Human Actions in Japanese. Introdu …

BiDet: An Efficient Binarized Object Detector


Title	BiDet: An Efficient Binarized Object Detector
Authors	Ziwei Wang, Ziyi Wu, Jiwen Lu, Jie Zhou
Abstract	In this paper, we propose a binarized neural network learning method called BiDet for efficient object detection. Conventional network binarization methods directly quantize the weights and activations in one-stage or two-stage detectors with constrained representational capacity, so that the information redundancy in the networks causes numerous false positives and degrades the performance significantly. On the contrary, our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal, through which the detection precision is enhanced with alleviated false positives. Specifically, we generalize the information bottleneck (IB) principle to object detection, where the amount of information in the high-level feature maps is constrained and the mutual information between the feature maps and object detection is maximized. Meanwhile, we learn sparse object priors so that the posteriors are concentrated on informative detection prediction with false positive elimination. Extensive experiments on the PASCAL VOC and COCO datasets show that our method outperforms the state-of-the-art binary neural networks by a sizable margin.
Tasks	Object Detection
Published	2020-03-09
URL	https://arxiv.org/abs/2003.03961v1
PDF	https://arxiv.org/pdf/2003.03961v1.pdf
PWC	https://paperswithcode.com/paper/bidet-an-efficient-binarized-object-detector
Repo
Framework

Structure-Property Maps with Kernel Principal Covariates Regression


Title	Structure-Property Maps with Kernel Principal Covariates Regression
Authors	Benjamin A. Helfrecht, Rose K. Cersonsky, Guillaume Fraux, Michele Ceriotti
Abstract	Data analysis based on linear methods, which look for correlations between the features describing samples in a data set, or between features and properties associated with the samples, constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an under-appreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview of these data analysis schemes, including the use of the kernel trick to introduce an element of non-linearity in the process, while maintaining most of the convenience and the simplicity of linear approaches. We then introduce a kernelized version of PCovR and a sparsified extension, followed by a feature-selection scheme based on the CUR matrix decomposition modified to incorporate the same hybrid loss that underlies PCovR. We demonstrate the performance of these approaches in revealing and predicting structure-property relations in chemistry and materials science.
Tasks	Feature Selection
Published	2020-02-12
URL	https://arxiv.org/abs/2002.05076v1
PDF	https://arxiv.org/pdf/2002.05076v1.pdf
PWC	https://paperswithcode.com/paper/structure-property-maps-with-kernel-principal
Repo
Framework

Eliminating Search Intent Bias in Learning to Rank


Title	Eliminating Search Intent Bias in Learning to Rank
Authors	Yingcheng Sun, Richard Kolacinski, Kenneth Loparo
Abstract	Click-through data has proven to be a valuable resource for improving search-ranking quality. Search engines can easily collect click data, but biases introduced in the data can make it difficult to use the data effectively. In order to measure the effects of biases, many click models have been proposed in the literature. However, none of the models can explain the observation that users with different search intent (e.g., informational, navigational, etc.) have different click behaviors. In this paper, we study how differences in user search intent can influence click activities and determined that there exists a bias between user search intent and the relevance of the document relevance. Based on this observation, we propose a search intent bias hypothesis that can be applied to most existing click models to improve their ability to learn unbiased relevance. Experimental results demonstrate that after adopting the search intent hypothesis, click models can better interpret user clicks and substantially improve retrieval performance.
Tasks	Learning-To-Rank
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03203v2
PDF	https://arxiv.org/pdf/2002.03203v2.pdf
PWC	https://paperswithcode.com/paper/eliminating-search-intent-bias-in-learning-to
Repo
Framework

Felix: Flexible Text Editing Through Tagging and Insertion


Title	Felix: Flexible Text Editing Through Tagging and Insertion
Authors	Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, Guillermo Garrido
Abstract	We present Felix — a flexible text-editing approach for generation, designed to derive the maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pre-training. In contrast to conventional sequence-to-sequence (seq2seq) models, Felix is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input. The tagging model employs a novel Pointer mechanism, while the insertion model is based on a Masked Language Model. Both of these models are chosen to be non-autoregressive to guarantee faster inference. Felix performs favourably when compared to recent text-editing methods and strong seq2seq baselines when evaluated on four NLG tasks: Sentence Fusion, Machine Translation Automatic Post-Editing, Summarization, and Text Simplification.
Tasks	Automatic Post-Editing, Language Modelling, Machine Translation, Text Simplification
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10687v1
PDF	https://arxiv.org/pdf/2003.10687v1.pdf
PWC	https://paperswithcode.com/paper/felix-flexible-text-editing-through-tagging
Repo
Framework

Cryptanalytic Extraction of Neural Network Models


Title	Cryptanalytic Extraction of Neural Network Models
Authors	Nicholas Carlini, Matthew Jagielski, Ilya Mironov
Abstract	We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such. Given oracle access to a neural network, we introduce a differential attack that can efficiently steal the parameters of the remote model up to floating point precision. Our attack relies on the fact that ReLU neural networks are piecewise linear functions, and that queries at the critical points reveal information about the model parameters. We evaluate our attack on multiple neural network models and extract models that are 2^20 times more precise and require 100x fewer queries than prior work. For example, we extract a 100,000 parameter neural network trained on the MNIST digit recognition task with 2^21.5 queries in under an hour, such that the extracted model agrees with the oracle on all inputs up to a worst-case error of 2^-25, or a model with 4,000 parameters in 2^18.5 queries with worst-case error of 2^-40.4.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04884v1
PDF	https://arxiv.org/pdf/2003.04884v1.pdf
PWC	https://paperswithcode.com/paper/cryptanalytic-extraction-of-neural-network
Repo
Framework

Video Caption Dataset for Describing Human Actions in Japanese


Title	Video Caption Dataset for Describing Human Actions in Japanese
Authors	Yutaro Shigeto, Yuya Yoshikawa, Jiaqing Lin, Akikazu Takeuchi
Abstract	In recent years, automatic video caption generation has attracted considerable attention. This paper focuses on the generation of Japanese captions for describing human actions. While most currently available video caption datasets have been constructed for English, there is no equivalent Japanese dataset. To address this, we constructed a large-scale Japanese video caption dataset consisting of 79,822 videos and 399,233 captions. Each caption in our dataset describes a video in the form of “who does what and where.” To describe human actions, it is important to identify the details of a person, place, and action. Indeed, when we describe human actions, we usually mention the scene, person, and action. In our experiments, we evaluated two caption generation methods to obtain benchmark results. Further, we investigated whether those generation methods could specify “who does what and where.”
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04865v1
PDF	https://arxiv.org/pdf/2003.04865v1.pdf
PWC	https://paperswithcode.com/paper/video-caption-dataset-for-describing-human
Repo
Framework

Introducing Fuzzy Layers for Deep Learning


Title	Introducing Fuzzy Layers for Deep Learning
Authors	Stanton R. Price, Steven R. Price, Derek T. Anderson
Abstract	Many state-of-the-art technologies developed in recent years have been influenced by machine learning to some extent. Most popular at the time of this writing are artificial intelligence methodologies that fall under the umbrella of deep learning. Deep learning has been shown across many applications to be extremely powerful and capable of handling problems that possess great complexity and difficulty. In this work, we introduce a new layer to deep learning: the fuzzy layer. Traditionally, the network architecture of neural networks is composed of an input layer, some combination of hidden layers, and an output layer. We propose the introduction of fuzzy layers into the deep learning architecture to exploit the powerful aggregation properties expressed through fuzzy methodologies, such as the Choquet and Sugueno fuzzy integrals. To date, fuzzy approaches taken to deep learning have been through the application of various fusion strategies at the decision level to aggregate outputs from state-of-the-art pre-trained models, e.g., AlexNet, VGG16, GoogLeNet, Inception-v3, ResNet-18, etc. While these strategies have been shown to improve accuracy performance for image classification tasks, none have explored the use of fuzzified intermediate, or hidden, layers. Herein, we present a new deep learning strategy that incorporates fuzzy strategies into the deep learning architecture focused on the application of semantic segmentation using per-pixel classification. Experiments are conducted on a benchmark data set as well as a data set collected via an unmanned aerial system at a U.S. Army test site for the task of automatic road segmentation, and preliminary results are promising.
Tasks	Image Classification, Semantic Segmentation
Published	2020-02-21
URL	https://arxiv.org/abs/2003.00880v1
PDF	https://arxiv.org/pdf/2003.00880v1.pdf
PWC	https://paperswithcode.com/paper/introducing-fuzzy-layers-for-deep-learning
Repo
Framework

Learning Reusable Options for Multi-Task Reinforcement Learning


Title	Learning Reusable Options for Multi-Task Reinforcement Learning
Authors	Francisco M. Garcia, Chris Nota, Philip S. Thomas
Abstract	Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01577v1
PDF	https://arxiv.org/pdf/2001.01577v1.pdf
PWC	https://paperswithcode.com/paper/learning-reusable-options-for-multi-task-1
Repo
Framework

Benchmarking TinyML Systems: Challenges and Direction


Title	Benchmarking TinyML Systems: Challenges and Direction
Authors	Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau, Jae-sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, Poonam Yadav
Abstract	Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems. In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of 30 organizations.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04821v1
PDF	https://arxiv.org/pdf/2003.04821v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-tinyml-systems-challenges-and
Repo
Framework

Accelerated Analog Neuromorphic Computing


Title	Accelerated Analog Neuromorphic Computing
Authors	Johannes Schemmel, Sebastian Billaudelle, Phillip Dauer, Johannes Weis
Abstract	This paper presents the concepts behind the BrainScales (BSS) accelerated analog neuromorphic computing architecture. It describes the second-generation BrainScales-2 (BSS-2) version and its most recent in-silico realization, the HICANN-X Application Specific Integrated Circuit (ASIC), as it has been developed as part of the neuromorphic computing activities within the European Human Brain Project (HBP). While the first generation is implemented in an 180nm process, the second generation uses 65nm technology. This allows the integration of a digital plasticity processing unit, a highly-parallel micro processor specially built for the computational needs of learning in an accelerated analog neuromorphic systems. The presented architecture is based upon a continuous-time, analog, physical model implementation of neurons and synapses, resembling an analog neuromorphic accelerator attached to build-in digital compute cores. While the analog part emulates the spike-based dynamics of the neural network in continuous-time, the latter simulates biological processes happening on a slower time-scale, like structural and parameter changes. Compared to biological time-scales, the emulation is highly accelerated, i.e. all time-constants are several orders of magnitude smaller than in biology. Programmable ion channel emulation and inter-compartmental conductances allow the modeling of nonlinear dendrites, back-propagating action-potentials as well as NMDA and Calcium plateau potentials. To extend the usability of the analog accelerator, it also supports vector-matrix multiplication. Thereby, BSS-2 supports inference of deep convolutional networks as well as local-learning with complex ensembles of spiking neurons within the same substrate.
Tasks
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11996v1
PDF	https://arxiv.org/pdf/2003.11996v1.pdf
PWC	https://paperswithcode.com/paper/accelerated-analog-neuromorphic-computing
Repo
Framework

A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem


Title	A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem
Authors	Kei Nemoto, Shweta Jain
Abstract	Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems started being trained with user-generated data, using categorical data as true labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be motivated to provide incorrect labels in order to improve their own utility from the system. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to bias and manipulation by the user. We define this as a subjective class issue and provide a framework for detecting subjective labels in a dataset without querying oracle. Using this framework, data mining practitioners can detect a subjective class at an early stage of their projects, and avoid wasting their precious time and resources by dealing with subjective class problem with traditional machine learning techniques.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10621v1
PDF	https://arxiv.org/pdf/2003.10621v1.pdf
PWC	https://paperswithcode.com/paper/a-pitfall-of-learning-from-user-generated
Repo
Framework

Multi-Scale Superpatch Matching using Dual Superpixel Descriptors


Title	Multi-Scale Superpatch Matching using Dual Superpixel Descriptors
Authors	Rémi Giraud, Merlin Boyer, Michaël Clément
Abstract	Over-segmentation into superpixels is a very effective dimensionality reduction strategy, enabling fast dense image processing. The main issue of this approach is the inherent irregularity of the image decomposition compared to standard hierarchical multi-resolution schemes, especially when searching for similar neighboring patterns. Several works have attempted to overcome this issue by taking into account the region irregularity into their comparison model. Nevertheless, they remain sub-optimal to provide robust and accurate superpixel neighborhood descriptors, since they only compute features within each region, poorly capturing contour information at superpixel borders. In this work, we address these limitations by introducing the dual superpatch, a novel superpixel neighborhood descriptor. This structure contains features computed in reduced superpixel regions, as well as at the interfaces of multiple superpixels to explicitly capture contour structure information. A fast multi-scale non-local matching framework is also introduced for the search of similar descriptors at different resolution levels in an image dataset. The proposed dual superpatch enables to more accurately capture similar structured patterns at different scales, and we demonstrate the robustness and performance of this new strategy on matching and supervised labeling applications.
Tasks	Dimensionality Reduction
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04428v1
PDF	https://arxiv.org/pdf/2003.04428v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-superpatch-matching-using-dual
Repo
Framework

Sparsity-Aware Deep Learning for Automatic 4D Facial Expression Recognition


Title	Sparsity-Aware Deep Learning for Automatic 4D Facial Expression Recognition
Authors	Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao
Abstract	In this paper, we present a sparsity-aware deep network for automatic 4D facial expression recognition (FER). Given 4D data, we first propose a novel augmentation method to combat the data limitation problem for deep learning. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing channel concatenation. Encoded in the given 3D landmarks, we also introduce TOP-landmarks over multi-views, an effective way to capture the facial muscle movements from three orthogonal planes. Importantly, we then present a sparsity-aware network to compute the sparse representations of convolutional features over multi-views for a significant and computationally convenient deep learning. For training, the TOP-landmarks and sparse representations are used to train a long short-term memory (LSTM) network. The refined predictions are achieved when the learned features collaborate over multi-views. Extensive experimental results achieved on the BU-4DFE dataset show the significance of our method over the state-of-the-art methods by reaching a promising accuracy of 99.69% for 4D FER.
Tasks	Facial Expression Recognition
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03157v1
PDF	https://arxiv.org/pdf/2002.03157v1.pdf
PWC	https://paperswithcode.com/paper/sparsity-aware-deep-learning-for-automatic-4d
Repo
Framework

Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning


Title	Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning
Authors	Christopher Zach, Huu Le
Abstract	Optimization problems with an auxiliary latent variable structure in addition to the main model parameters occur frequently in computer vision and machine learning. The additional latent variables make the underlying optimization task expensive, either in terms of memory (by maintaining the latent variables), or in terms of runtime (repeated exact inference of latent variables). We aim to remove the need to maintain the latent variables and propose two formally justified methods, that dynamically adapt the required accuracy of latent variable inference. These methods have applications in large scale robust estimation and in learning energy-based models from labeled data.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05886v1
PDF	https://arxiv.org/pdf/2003.05886v1.pdf
PWC	https://paperswithcode.com/paper/truncated-inference-for-latent-variable
Repo
Framework

Self-supervised ECG Representation Learning for Emotion Recognition


Title	Self-supervised ECG Representation Learning for Emotion Recognition
Authors	Pritam Sarkar, Ali Etemad
Abstract	We present a self-supervised deep multi-task learning framework for electrocardiogram (ECG) -based emotion recognition. The proposed framework consists of two stages of learning a) learning ECG representations and b) learning to classify emotions. ECG representations are learned by a signal transformation recognition network. The network learns high-level abstract representations from unlabeled ECG data. Six different signal transformations are applied to the ECG signals, and transformation recognition is performed as pretext tasks. Training the model on pretext tasks helps our network learn spatiotemporal representations that generalize well across different datasets and different emotion categories. We transfer the weights of the self-supervised network to an emotion recognition network, where the convolutional layers are kept frozen and the dense layers are trained with labelled ECG data. We show that our proposed method considerably improves the performance compared to a network trained using fully-supervised learning. New state-of-the-art results are set in classification of arousal, valence, affective states, and stress for the four utilized datasets. Extensive experiments are performed, providing interesting insights into the impact of using a multi-task self-supervised structure instead of a single-task model, as well as the optimum level of difficulty required for the pretext self-supervised tasks.
Tasks	Emotion Recognition, Multi-Task Learning, Representation Learning
Published	2020-02-04
URL	https://arxiv.org/abs/2002.03898v1
PDF	https://arxiv.org/pdf/2002.03898v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-ecg-representation-learning
Repo
Framework