Paper Group ANR 515
BiDet: An Efficient Binarized Object Detector. Structure-Property Maps with Kernel Principal Covariates Regression. Eliminating Search Intent Bias in Learning to Rank. Felix: Flexible Text Editing Through Tagging and Insertion. Cryptanalytic Extraction of Neural Network Models. Video Caption Dataset for Describing Human Actions in Japanese. Introdu …
BiDet: An Efficient Binarized Object Detector
Title | BiDet: An Efficient Binarized Object Detector |
Authors | Ziwei Wang, Ziyi Wu, Jiwen Lu, Jie Zhou |
Abstract | In this paper, we propose a binarized neural network learning method called BiDet for efficient object detection. Conventional network binarization methods directly quantize the weights and activations in one-stage or two-stage detectors with constrained representational capacity, so that the information redundancy in the networks causes numerous false positives and degrades the performance significantly. On the contrary, our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal, through which the detection precision is enhanced with alleviated false positives. Specifically, we generalize the information bottleneck (IB) principle to object detection, where the amount of information in the high-level feature maps is constrained and the mutual information between the feature maps and object detection is maximized. Meanwhile, we learn sparse object priors so that the posteriors are concentrated on informative detection prediction with false positive elimination. Extensive experiments on the PASCAL VOC and COCO datasets show that our method outperforms the state-of-the-art binary neural networks by a sizable margin. |
Tasks | Object Detection |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.03961v1 |
https://arxiv.org/pdf/2003.03961v1.pdf | |
PWC | https://paperswithcode.com/paper/bidet-an-efficient-binarized-object-detector |
Repo | |
Framework | |
Structure-Property Maps with Kernel Principal Covariates Regression
Title | Structure-Property Maps with Kernel Principal Covariates Regression |
Authors | Benjamin A. Helfrecht, Rose K. Cersonsky, Guillaume Fraux, Michele Ceriotti |
Abstract | Data analysis based on linear methods, which look for correlations between the features describing samples in a data set, or between features and properties associated with the samples, constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an under-appreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview of these data analysis schemes, including the use of the kernel trick to introduce an element of non-linearity in the process, while maintaining most of the convenience and the simplicity of linear approaches. We then introduce a kernelized version of PCovR and a sparsified extension, followed by a feature-selection scheme based on the CUR matrix decomposition modified to incorporate the same hybrid loss that underlies PCovR. We demonstrate the performance of these approaches in revealing and predicting structure-property relations in chemistry and materials science. |
Tasks | Feature Selection |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.05076v1 |
https://arxiv.org/pdf/2002.05076v1.pdf | |
PWC | https://paperswithcode.com/paper/structure-property-maps-with-kernel-principal |
Repo | |
Framework | |
Eliminating Search Intent Bias in Learning to Rank
Title | Eliminating Search Intent Bias in Learning to Rank |
Authors | Yingcheng Sun, Richard Kolacinski, Kenneth Loparo |
Abstract | Click-through data has proven to be a valuable resource for improving search-ranking quality. Search engines can easily collect click data, but biases introduced in the data can make it difficult to use the data effectively. In order to measure the effects of biases, many click models have been proposed in the literature. However, none of the models can explain the observation that users with different search intent (e.g., informational, navigational, etc.) have different click behaviors. In this paper, we study how differences in user search intent can influence click activities and determined that there exists a bias between user search intent and the relevance of the document relevance. Based on this observation, we propose a search intent bias hypothesis that can be applied to most existing click models to improve their ability to learn unbiased relevance. Experimental results demonstrate that after adopting the search intent hypothesis, click models can better interpret user clicks and substantially improve retrieval performance. |
Tasks | Learning-To-Rank |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03203v2 |
https://arxiv.org/pdf/2002.03203v2.pdf | |
PWC | https://paperswithcode.com/paper/eliminating-search-intent-bias-in-learning-to |
Repo | |
Framework | |
Felix: Flexible Text Editing Through Tagging and Insertion
Title | Felix: Flexible Text Editing Through Tagging and Insertion |
Authors | Jonathan Mallinson, Aliaksei Severyn, Eric Malmi, Guillermo Garrido |
Abstract | We present Felix — a flexible text-editing approach for generation, designed to derive the maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pre-training. In contrast to conventional sequence-to-sequence (seq2seq) models, Felix is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input. The tagging model employs a novel Pointer mechanism, while the insertion model is based on a Masked Language Model. Both of these models are chosen to be non-autoregressive to guarantee faster inference. Felix performs favourably when compared to recent text-editing methods and strong seq2seq baselines when evaluated on four NLG tasks: Sentence Fusion, Machine Translation Automatic Post-Editing, Summarization, and Text Simplification. |
Tasks | Automatic Post-Editing, Language Modelling, Machine Translation, Text Simplification |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.10687v1 |
https://arxiv.org/pdf/2003.10687v1.pdf | |
PWC | https://paperswithcode.com/paper/felix-flexible-text-editing-through-tagging |
Repo | |
Framework | |
Cryptanalytic Extraction of Neural Network Models
Title | Cryptanalytic Extraction of Neural Network Models |
Authors | Nicholas Carlini, Matthew Jagielski, Ilya Mironov |
Abstract | We argue that the machine learning problem of model extraction is actually a cryptanalytic problem in disguise, and should be studied as such. Given oracle access to a neural network, we introduce a differential attack that can efficiently steal the parameters of the remote model up to floating point precision. Our attack relies on the fact that ReLU neural networks are piecewise linear functions, and that queries at the critical points reveal information about the model parameters. We evaluate our attack on multiple neural network models and extract models that are 2^20 times more precise and require 100x fewer queries than prior work. For example, we extract a 100,000 parameter neural network trained on the MNIST digit recognition task with 2^21.5 queries in under an hour, such that the extracted model agrees with the oracle on all inputs up to a worst-case error of 2^-25, or a model with 4,000 parameters in 2^18.5 queries with worst-case error of 2^-40.4. |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04884v1 |
https://arxiv.org/pdf/2003.04884v1.pdf | |
PWC | https://paperswithcode.com/paper/cryptanalytic-extraction-of-neural-network |
Repo | |
Framework | |
Video Caption Dataset for Describing Human Actions in Japanese
Title | Video Caption Dataset for Describing Human Actions in Japanese |
Authors | Yutaro Shigeto, Yuya Yoshikawa, Jiaqing Lin, Akikazu Takeuchi |
Abstract | In recent years, automatic video caption generation has attracted considerable attention. This paper focuses on the generation of Japanese captions for describing human actions. While most currently available video caption datasets have been constructed for English, there is no equivalent Japanese dataset. To address this, we constructed a large-scale Japanese video caption dataset consisting of 79,822 videos and 399,233 captions. Each caption in our dataset describes a video in the form of “who does what and where.” To describe human actions, it is important to identify the details of a person, place, and action. Indeed, when we describe human actions, we usually mention the scene, person, and action. In our experiments, we evaluated two caption generation methods to obtain benchmark results. Further, we investigated whether those generation methods could specify “who does what and where.” |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04865v1 |
https://arxiv.org/pdf/2003.04865v1.pdf | |
PWC | https://paperswithcode.com/paper/video-caption-dataset-for-describing-human |
Repo | |
Framework | |
Introducing Fuzzy Layers for Deep Learning
Title | Introducing Fuzzy Layers for Deep Learning |
Authors | Stanton R. Price, Steven R. Price, Derek T. Anderson |
Abstract | Many state-of-the-art technologies developed in recent years have been influenced by machine learning to some extent. Most popular at the time of this writing are artificial intelligence methodologies that fall under the umbrella of deep learning. Deep learning has been shown across many applications to be extremely powerful and capable of handling problems that possess great complexity and difficulty. In this work, we introduce a new layer to deep learning: the fuzzy layer. Traditionally, the network architecture of neural networks is composed of an input layer, some combination of hidden layers, and an output layer. We propose the introduction of fuzzy layers into the deep learning architecture to exploit the powerful aggregation properties expressed through fuzzy methodologies, such as the Choquet and Sugueno fuzzy integrals. To date, fuzzy approaches taken to deep learning have been through the application of various fusion strategies at the decision level to aggregate outputs from state-of-the-art pre-trained models, e.g., AlexNet, VGG16, GoogLeNet, Inception-v3, ResNet-18, etc. While these strategies have been shown to improve accuracy performance for image classification tasks, none have explored the use of fuzzified intermediate, or hidden, layers. Herein, we present a new deep learning strategy that incorporates fuzzy strategies into the deep learning architecture focused on the application of semantic segmentation using per-pixel classification. Experiments are conducted on a benchmark data set as well as a data set collected via an unmanned aerial system at a U.S. Army test site for the task of automatic road segmentation, and preliminary results are promising. |
Tasks | Image Classification, Semantic Segmentation |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2003.00880v1 |
https://arxiv.org/pdf/2003.00880v1.pdf | |
PWC | https://paperswithcode.com/paper/introducing-fuzzy-layers-for-deep-learning |
Repo | |
Framework | |
Learning Reusable Options for Multi-Task Reinforcement Learning
Title | Learning Reusable Options for Multi-Task Reinforcement Learning |
Authors | Francisco M. Garcia, Chris Nota, Philip S. Thomas |
Abstract | Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems. |
Tasks | |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01577v1 |
https://arxiv.org/pdf/2001.01577v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-reusable-options-for-multi-task-1 |
Repo | |
Framework | |
Benchmarking TinyML Systems: Challenges and Direction
Title | Benchmarking TinyML Systems: Challenges and Direction |
Authors | Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau, Jae-sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, Poonam Yadav |
Abstract | Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems. In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of 30 organizations. |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04821v1 |
https://arxiv.org/pdf/2003.04821v1.pdf | |
PWC | https://paperswithcode.com/paper/benchmarking-tinyml-systems-challenges-and |
Repo | |
Framework | |
Accelerated Analog Neuromorphic Computing
Title | Accelerated Analog Neuromorphic Computing |
Authors | Johannes Schemmel, Sebastian Billaudelle, Phillip Dauer, Johannes Weis |
Abstract | This paper presents the concepts behind the BrainScales (BSS) accelerated analog neuromorphic computing architecture. It describes the second-generation BrainScales-2 (BSS-2) version and its most recent in-silico realization, the HICANN-X Application Specific Integrated Circuit (ASIC), as it has been developed as part of the neuromorphic computing activities within the European Human Brain Project (HBP). While the first generation is implemented in an 180nm process, the second generation uses 65nm technology. This allows the integration of a digital plasticity processing unit, a highly-parallel micro processor specially built for the computational needs of learning in an accelerated analog neuromorphic systems. The presented architecture is based upon a continuous-time, analog, physical model implementation of neurons and synapses, resembling an analog neuromorphic accelerator attached to build-in digital compute cores. While the analog part emulates the spike-based dynamics of the neural network in continuous-time, the latter simulates biological processes happening on a slower time-scale, like structural and parameter changes. Compared to biological time-scales, the emulation is highly accelerated, i.e. all time-constants are several orders of magnitude smaller than in biology. Programmable ion channel emulation and inter-compartmental conductances allow the modeling of nonlinear dendrites, back-propagating action-potentials as well as NMDA and Calcium plateau potentials. To extend the usability of the analog accelerator, it also supports vector-matrix multiplication. Thereby, BSS-2 supports inference of deep convolutional networks as well as local-learning with complex ensembles of spiking neurons within the same substrate. |
Tasks | |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.11996v1 |
https://arxiv.org/pdf/2003.11996v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerated-analog-neuromorphic-computing |
Repo | |
Framework | |
A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem
Title | A Pitfall of Learning from User-generated Data: In-depth Analysis of Subjective Class Problem |
Authors | Kei Nemoto, Shweta Jain |
Abstract | Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems started being trained with user-generated data, using categorical data as true labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be motivated to provide incorrect labels in order to improve their own utility from the system. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to bias and manipulation by the user. We define this as a subjective class issue and provide a framework for detecting subjective labels in a dataset without querying oracle. Using this framework, data mining practitioners can detect a subjective class at an early stage of their projects, and avoid wasting their precious time and resources by dealing with subjective class problem with traditional machine learning techniques. |
Tasks | |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.10621v1 |
https://arxiv.org/pdf/2003.10621v1.pdf | |
PWC | https://paperswithcode.com/paper/a-pitfall-of-learning-from-user-generated |
Repo | |
Framework | |
Multi-Scale Superpatch Matching using Dual Superpixel Descriptors
Title | Multi-Scale Superpatch Matching using Dual Superpixel Descriptors |
Authors | Rémi Giraud, Merlin Boyer, Michaël Clément |
Abstract | Over-segmentation into superpixels is a very effective dimensionality reduction strategy, enabling fast dense image processing. The main issue of this approach is the inherent irregularity of the image decomposition compared to standard hierarchical multi-resolution schemes, especially when searching for similar neighboring patterns. Several works have attempted to overcome this issue by taking into account the region irregularity into their comparison model. Nevertheless, they remain sub-optimal to provide robust and accurate superpixel neighborhood descriptors, since they only compute features within each region, poorly capturing contour information at superpixel borders. In this work, we address these limitations by introducing the dual superpatch, a novel superpixel neighborhood descriptor. This structure contains features computed in reduced superpixel regions, as well as at the interfaces of multiple superpixels to explicitly capture contour structure information. A fast multi-scale non-local matching framework is also introduced for the search of similar descriptors at different resolution levels in an image dataset. The proposed dual superpatch enables to more accurately capture similar structured patterns at different scales, and we demonstrate the robustness and performance of this new strategy on matching and supervised labeling applications. |
Tasks | Dimensionality Reduction |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04428v1 |
https://arxiv.org/pdf/2003.04428v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-superpatch-matching-using-dual |
Repo | |
Framework | |
Sparsity-Aware Deep Learning for Automatic 4D Facial Expression Recognition
Title | Sparsity-Aware Deep Learning for Automatic 4D Facial Expression Recognition |
Authors | Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao |
Abstract | In this paper, we present a sparsity-aware deep network for automatic 4D facial expression recognition (FER). Given 4D data, we first propose a novel augmentation method to combat the data limitation problem for deep learning. This is achieved by projecting the input data into RGB and depth map images and then iteratively performing channel concatenation. Encoded in the given 3D landmarks, we also introduce TOP-landmarks over multi-views, an effective way to capture the facial muscle movements from three orthogonal planes. Importantly, we then present a sparsity-aware network to compute the sparse representations of convolutional features over multi-views for a significant and computationally convenient deep learning. For training, the TOP-landmarks and sparse representations are used to train a long short-term memory (LSTM) network. The refined predictions are achieved when the learned features collaborate over multi-views. Extensive experimental results achieved on the BU-4DFE dataset show the significance of our method over the state-of-the-art methods by reaching a promising accuracy of 99.69% for 4D FER. |
Tasks | Facial Expression Recognition |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03157v1 |
https://arxiv.org/pdf/2002.03157v1.pdf | |
PWC | https://paperswithcode.com/paper/sparsity-aware-deep-learning-for-automatic-4d |
Repo | |
Framework | |
Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning
Title | Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning |
Authors | Christopher Zach, Huu Le |
Abstract | Optimization problems with an auxiliary latent variable structure in addition to the main model parameters occur frequently in computer vision and machine learning. The additional latent variables make the underlying optimization task expensive, either in terms of memory (by maintaining the latent variables), or in terms of runtime (repeated exact inference of latent variables). We aim to remove the need to maintain the latent variables and propose two formally justified methods, that dynamically adapt the required accuracy of latent variable inference. These methods have applications in large scale robust estimation and in learning energy-based models from labeled data. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05886v1 |
https://arxiv.org/pdf/2003.05886v1.pdf | |
PWC | https://paperswithcode.com/paper/truncated-inference-for-latent-variable |
Repo | |
Framework | |
Self-supervised ECG Representation Learning for Emotion Recognition
Title | Self-supervised ECG Representation Learning for Emotion Recognition |
Authors | Pritam Sarkar, Ali Etemad |
Abstract | We present a self-supervised deep multi-task learning framework for electrocardiogram (ECG) -based emotion recognition. The proposed framework consists of two stages of learning a) learning ECG representations and b) learning to classify emotions. ECG representations are learned by a signal transformation recognition network. The network learns high-level abstract representations from unlabeled ECG data. Six different signal transformations are applied to the ECG signals, and transformation recognition is performed as pretext tasks. Training the model on pretext tasks helps our network learn spatiotemporal representations that generalize well across different datasets and different emotion categories. We transfer the weights of the self-supervised network to an emotion recognition network, where the convolutional layers are kept frozen and the dense layers are trained with labelled ECG data. We show that our proposed method considerably improves the performance compared to a network trained using fully-supervised learning. New state-of-the-art results are set in classification of arousal, valence, affective states, and stress for the four utilized datasets. Extensive experiments are performed, providing interesting insights into the impact of using a multi-task self-supervised structure instead of a single-task model, as well as the optimum level of difficulty required for the pretext self-supervised tasks. |
Tasks | Emotion Recognition, Multi-Task Learning, Representation Learning |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.03898v1 |
https://arxiv.org/pdf/2002.03898v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-ecg-representation-learning |
Repo | |
Framework | |