October 16, 2019

3292 words 16 mins read

Paper Group ANR 996

Paper Group ANR 996

Interlinked Convolutional Neural Networks for Face Parsing. Generalized Spectral Mixture Kernels for Multi-Task Gaussian Processes. Combining Textual Content and Structure to Improve Dialog Similarity. Solving Large Sequential Games with the Excessive Gap Technique. An FPGA-based Massively Parallel Neuromorphic Cortex Simulator. Multifunctionality …

Interlinked Convolutional Neural Networks for Face Parsing

Title Interlinked Convolutional Neural Networks for Face Parsing
Authors Yisu Zhou, Xiaolin Hu, Bo Zhang
Abstract Face parsing is a basic task in face image analysis. It amounts to labeling each pixel with appropriate facial parts such as eyes and nose. In the paper, we present a interlinked convolutional neural network (iCNN) for solving this problem in an end-to-end fashion. It consists of multiple convolutional neural networks (CNNs) taking input in different scales. A special interlinking layer is designed to allow the CNNs to exchange information, enabling them to integrate local and contextual information efficiently. The hallmark of iCNN is the extensive use of downsampling and upsampling in the interlinking layers, while traditional CNNs usually uses downsampling only. A two-stage pipeline is proposed for face parsing and both stages use iCNN. The first stage localizes facial parts in the size-reduced image and the second stage labels the pixels in the identified facial parts in the original image. On a benchmark dataset we have obtained better results than the state-of-the-art methods.
Tasks
Published 2018-06-07
URL http://arxiv.org/abs/1806.02479v1
PDF http://arxiv.org/pdf/1806.02479v1.pdf
PWC https://paperswithcode.com/paper/interlinked-convolutional-neural-networks-for
Repo
Framework

Generalized Spectral Mixture Kernels for Multi-Task Gaussian Processes

Title Generalized Spectral Mixture Kernels for Multi-Task Gaussian Processes
Authors Kai Chen, Perry Groot, Jinsong Chen, Elena Marchiori
Abstract Multi-Task Gaussian processes (MTGPs) have shown a significant progress both in expressiveness and interpretation of the relatedness between different tasks: from linear combinations of independent single-output Gaussian processes (GPs), through the direct modeling of the cross-covariances such as spectral mixture kernels with phase shift, to the design of multivariate covariance functions based on spectral mixture kernels which model delays among tasks in addition to phase differences, and which provide a parametric interpretation of the relatedness across tasks. In this paper we further extend expressiveness and interpretability of MTGPs models and introduce a new family of kernels capable to model nonlinear correlations between tasks as well as dependencies between spectral mixtures, including time and phase delay. Specifically, we use generalized convolution spectral mixture kernels for modeling dependencies at spectral mixture level, and coupling coregionalization for discovering task level correlations. The proposed kernels for MTGP are validated on artificial data and compared with existing MTGPs methods on three real-world experiments. Results indicate the benefits of our more expressive representation with respect to performance and interpretability.
Tasks Gaussian Processes
Published 2018-08-03
URL http://arxiv.org/abs/1808.01132v6
PDF http://arxiv.org/pdf/1808.01132v6.pdf
PWC https://paperswithcode.com/paper/generalized-spectral-mixture-kernels-for
Repo
Framework

Combining Textual Content and Structure to Improve Dialog Similarity

Title Combining Textual Content and Structure to Improve Dialog Similarity
Authors Ana Paula Appel, Paulo Rodrigo Cavalin, Marisa Affonso Vasconcelos, Claudio Santos Pinhanez
Abstract Chatbots, taking advantage of the success of the messaging apps and recent advances in Artificial Intelligence, have become very popular, from helping business to improve customer services to chatting to users for the sake of conversation and engagement (celebrity or personal bots). However, developing and improving a chatbot requires understanding their data generated by its users. Dialog data has a different nature of a simple question and answering interaction, in which context and temporal properties (turn order) creates a different understanding of such data. In this paper, we propose a novelty metric to compute dialogs’ similarity based not only on the text content but also on the information related to the dialog structure. Our experimental results performed over the Switchboard dataset show that using evidence from both textual content and the dialog structure leads to more accurate results than using each measure in isolation.
Tasks Chatbot
Published 2018-02-20
URL http://arxiv.org/abs/1802.07117v1
PDF http://arxiv.org/pdf/1802.07117v1.pdf
PWC https://paperswithcode.com/paper/combining-textual-content-and-structure-to
Repo
Framework

Solving Large Sequential Games with the Excessive Gap Technique

Title Solving Large Sequential Games with the Excessive Gap Technique
Authors Christian Kroer, Gabriele Farina, Tuomas Sandholm
Abstract There has been tremendous recent progress on equilibrium-finding algorithms for zero-sum imperfect-information extensive-form games, but there has been a puzzling gap between theory and practice. First-order methods have significantly better theoretical convergence rates than any counterfactual-regret minimization (CFR) variant. Despite this, CFR variants have been favored in practice. Experiments with first-order methods have only been conducted on small- and medium-sized games because those methods are complicated to implement in this setting, and because CFR variants have been enhanced extensively for over a decade they perform well in practice. In this paper we show that a particular first-order method, a state-of-the-art variant of the excessive gap technique—instantiated with the dilated entropy distance function—can efficiently solve large real-world problems competitively with CFR and its variants. We show this on large endgames encountered by the Libratus poker AI, which recently beat top human poker specialist professionals at no-limit Texas hold’em. We show experimental results on our variant of the excessive gap technique as well as a prior version. We introduce a numerically friendly implementation of the smoothed best response computation associated with first-order methods for extensive-form game solving. We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games. We present comparisons of several excessive gap technique and CFR variants.
Tasks
Published 2018-10-07
URL http://arxiv.org/abs/1810.03063v1
PDF http://arxiv.org/pdf/1810.03063v1.pdf
PWC https://paperswithcode.com/paper/solving-large-sequential-games-with-the
Repo
Framework

An FPGA-based Massively Parallel Neuromorphic Cortex Simulator

Title An FPGA-based Massively Parallel Neuromorphic Cortex Simulator
Authors Runchun Wang, Chetan Singh Thakur, Andre van Schaik
Abstract This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 {\mu}W per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks.
Tasks
Published 2018-03-08
URL http://arxiv.org/abs/1803.03015v1
PDF http://arxiv.org/pdf/1803.03015v1.pdf
PWC https://paperswithcode.com/paper/an-fpga-based-massively-parallel-neuromorphic
Repo
Framework

Multifunctionality in embodied agents: Three levels of neural reuse

Title Multifunctionality in embodied agents: Three levels of neural reuse
Authors Madhavun Candadai, Eduardo Izquierdo
Abstract The brain in conjunction with the body is able to adapt to new environments and perform multiple behaviors through reuse of neural resources and transfer of existing behavioral traits. Although mechanisms that underlie this ability are not well understood, they are largely attributed to neuromodulation. In this work, we demonstrate that an agent can be multifunctional using the same sensory and motor systems across behaviors, in the absence of modulatory mechanisms. Further, we lay out the different levels at which neural reuse can occur through a dynamical filtering of the brain-body-environment system’s operation: structural network, autonomous dynamics, and transient dynamics. Notably, transient dynamics reuse could only be explained by studying the brain-body-environment system as a whole and not just the brain. The multifunctional agent we present here demonstrates neural reuse at all three levels.
Tasks
Published 2018-02-12
URL http://arxiv.org/abs/1802.03891v4
PDF http://arxiv.org/pdf/1802.03891v4.pdf
PWC https://paperswithcode.com/paper/multifunctionality-in-embodied-agents-three
Repo
Framework

Rotation-Sensitive Regression for Oriented Scene Text Detection

Title Rotation-Sensitive Regression for Oriented Scene Text Detection
Authors Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, Xiang Bai
Abstract Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation. Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features. The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on three oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17 and COCO-Text. Furthermore, RRD achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.
Tasks Object Detection, Scene Text Detection
Published 2018-03-14
URL http://arxiv.org/abs/1803.05265v1
PDF http://arxiv.org/pdf/1803.05265v1.pdf
PWC https://paperswithcode.com/paper/rotation-sensitive-regression-for-oriented
Repo
Framework

COFGA: Classification Of Fine-Grained Features In Aerial Images

Title COFGA: Classification Of Fine-Grained Features In Aerial Images
Authors Eran Dahan, Tzvi Diskin
Abstract Classification between thousands of classes in high-resolution images is one of the heavily studied problems in deep learning over the last decade. However, the challenge of fine-grained multi-class classification of objects in aerial images, especially in low resource cases, is still challenging and an active area of research in the literature. Solving this problem can give rise to various applications in the field of scene understanding and classification and re-identification of specific objects from aerial images. In this paper, we provide a description of our dataset - COFGA of multi-class annotated objects in aerial images. We examine the results of existing state-of-the-art models and modified deep neural networks. Finally, we explain in detail the first published competition for solving this task.
Tasks Scene Understanding
Published 2018-08-27
URL http://arxiv.org/abs/1808.09001v1
PDF http://arxiv.org/pdf/1808.09001v1.pdf
PWC https://paperswithcode.com/paper/cofga-classification-of-fine-grained-features
Repo
Framework
Title Bayesian Quadrature for Multiple Related Integrals
Authors Xiaoyue Xi, François-Xavier Briol, Mark Girolami
Abstract Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional advantages, such as the possibility of transferring information between several numerical methods. This allows users to represent uncertainty in a more faithful manner and, as a by-product, provide increased numerical efficiency. We propose the first such numerical method by extending the well-known Bayesian quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics.
Tasks
Published 2018-01-12
URL http://arxiv.org/abs/1801.04153v7
PDF http://arxiv.org/pdf/1801.04153v7.pdf
PWC https://paperswithcode.com/paper/bayesian-quadrature-for-multiple-related
Repo
Framework

Center Emphasized Visual Saliency and a Contrast-based Full Reference Image Quality Index

Title Center Emphasized Visual Saliency and a Contrast-based Full Reference Image Quality Index
Authors Md Abu Layek, Sanjida Afroz, TaeChoong Chung, Eui-Nam Huh
Abstract Objective image quality assessment (IQA) is imperative in the current multimedia-intensive world, in order to assess the visual quality of an image at close to a human level of ability. Many~parameters such as color intensity, structure, sharpness, contrast, presence of an object, etc., draw human attention to an image. Psychological vision research suggests that human vision is biased to the center area of an image and display screen. As a result, if the center part contains any visually salient information, it draws human attention even more and any distortion in that part will be better perceived than other parts. To the best of our knowledge, previous IQA methods have not considered this fact. In this paper, we propose a full reference image quality assessment (FR-IQA) approach using visual saliency and contrast; however, we give extra attention to the center by increasing the sensitivity of the similarity maps in that region. We evaluated our method on three large-scale popular benchmark databases used by most of the current IQA researchers (TID2008, CSIQ~and LIVE), having a total of 3345 distorted images with 28~different kinds of distortions. Our~method is compared with 13 state-of-the-art approaches. This comparison reveals the stronger correlation of our method with human-evaluated values. The prediction-of-quality score is consistent for distortion specific as well as distortion independent cases. Moreover, faster processing makes it applicable to any real-time application. The MATLAB code is publicly available to test the algorithm and can be found online at http://layek.khu.ac.kr/CEQI.
Tasks Image Quality Assessment
Published 2018-12-28
URL http://arxiv.org/abs/1812.11163v3
PDF http://arxiv.org/pdf/1812.11163v3.pdf
PWC https://paperswithcode.com/paper/center-emphasized-visual-saliency-and-a
Repo
Framework

Federated Learning for Ultra-Reliable Low-Latency V2V Communications

Title Federated Learning for Ultra-Reliable Low-Latency V2V Communications
Authors Sumudu Samarakoon, Mehdi Bennis, Walid Saad, Merouane Debbah
Abstract In this paper, a novel joint transmit power and resource allocation approach for enabling ultra-reliable low-latency communication (URLLC) in vehicular networks is proposed. The objective is to minimize the network-wide power consumption of vehicular users (VUEs) while ensuring high reliability in terms of probabilistic queuing delays. In particular, a reliability measure is defined to characterize extreme events (i.e., when vehicles’ queue lengths exceed a predefined threshold with non-negligible probability) using extreme value theory (EVT). Leveraging principles from federated learning (FL), the distribution of these extreme events corresponding to the tail distribution of queues is estimated by VUEs in a decentralized manner. Finally, Lyapunov optimization is used to find the joint transmit power and resource allocation policies for each VUE in a distributed manner. The proposed solution is validated via extensive simulations using a Manhattan mobility model. It is shown that FL enables the proposed distributed method to estimate the tail distribution of queues with an accuracy that is very close to a centralized solution with up to 79% reductions in the amount of data that need to be exchanged. Furthermore, the proposed method yields up to 60% reductions of VUEs with large queue lengths, without an additional power consumption, compared to an average queue-based baseline. Compared to systems with fixed power consumption and focusing on queue stability while minimizing average power consumption, the reduction in extreme events of the proposed method is about two orders of magnitude.
Tasks
Published 2018-05-11
URL http://arxiv.org/abs/1805.09253v1
PDF http://arxiv.org/pdf/1805.09253v1.pdf
PWC https://paperswithcode.com/paper/federated-learning-for-ultra-reliable-low
Repo
Framework

A Deep-Learning-Based Fashion Attributes Detection Model

Title A Deep-Learning-Based Fashion Attributes Detection Model
Authors Menglin Jia, Yichen Zhou, Mengyun Shi, Bharath Hariharan
Abstract Analyzing fashion attributes is essential in the fashion design process. Current fashion forecasting firms, such as WGSN utilizes information from all around the world (from fashion shows, visual merchandising, blogs, etc). They gather information by experience, by observation, by media scan, by interviews, and by exposed to new things. Such information analyzing process is called abstracting, which recognize similarities or differences across all the garments and collections. In fact, such abstraction ability is useful in many fashion careers with different purposes. Fashion forecasters abstract across design collections and across time to identify fashion change and directions; designers, product developers and buyers abstract across a group of garments and collections to develop a cohesive and visually appeal lines; sales and marketing executives abstract across product line each season to recognize selling points; fashion journalist and bloggers abstract across runway photos to recognize symbolic core concepts that can be translated into editorial features. Fashion attributes analysis for such fashion insiders requires much detailed and in-depth attributes annotation than that for consumers, and requires inference on multiple domains. In this project, we propose a data-driven approach for recognizing fashion attributes. Specifically, a modified version of Faster R-CNN model is trained on images from a large-scale localization dataset with 594 fine-grained attributes under different scenarios, for example in online stores and street snapshots. This model will then be used to detect garment items and classify clothing attributes for runway photos and fashion illustrations.
Tasks
Published 2018-10-24
URL http://arxiv.org/abs/1810.10148v1
PDF http://arxiv.org/pdf/1810.10148v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-based-fashion-attributes
Repo
Framework

Bayesian State Estimation for Unobservable Distribution Systems via Deep Learning

Title Bayesian State Estimation for Unobservable Distribution Systems via Deep Learning
Authors Kursat Rasim Mestav, Jaime Luengo-Rozas, Lang Tong
Abstract The problem of state estimation for unobservable distribution systems is considered. A deep learning approach to Bayesian state estimation is proposed for real-time applications. The proposed technique consists of distribution learning of stochastic power injection, a Monte Carlo technique for the training of a deep neural network for state estimation, and a Bayesian bad-data detection and filtering algorithm. Structural characteristics of the deep neural networks are investigated. Simulations illustrate the accuracy of Bayesian state estimation for unobservable systems and demonstrate the benefit of employing a deep neural network. Numerical results show the robustness of Bayesian state estimation against modeling and estimation errors and the presence of bad and missing data. Comparing with pseudo-measurement techniques, direct Bayesian state estimation via deep learning neural network outperforms existing benchmarks.
Tasks Bayesian Inference
Published 2018-11-07
URL http://arxiv.org/abs/1811.02756v4
PDF http://arxiv.org/pdf/1811.02756v4.pdf
PWC https://paperswithcode.com/paper/bayesian-state-estimation-for-unobservable
Repo
Framework

Automatic Paper Summary Generation from Visual and Textual Information

Title Automatic Paper Summary Generation from Visual and Textual Information
Authors Shintaro Yamamoto, Yoshihiro Fukuhara, Ryota Suzuki, Shigeo Morishima, Hirokatsu Kataoka
Abstract Due to the recent boom in artificial intelligence (AI) research, including computer vision (CV), it has become impossible for researchers in these fields to keep up with the exponentially increasing number of manuscripts. In response to this situation, this paper proposes the paper summary generation (PSG) task using a simple but effective method to automatically generate an academic paper summary from raw PDF data. We realized PSG by combination of vision-based supervised components detector and language-based unsupervised important sentence extractor, which is applicable for a trained format of manuscripts. We show the quantitative evaluation of ability of simple vision-based components extraction, and the qualitative evaluation that our system can extract both visual item and sentence that are helpful for understanding. After processing via our PSG, the 979 manuscripts accepted by the Conference on Computer Vision and Pattern Recognition (CVPR) 2018 are available. It is believed that the proposed method will provide a better way for researchers to stay caught with important academic papers.
Tasks
Published 2018-11-16
URL http://arxiv.org/abs/1811.06943v1
PDF http://arxiv.org/pdf/1811.06943v1.pdf
PWC https://paperswithcode.com/paper/automatic-paper-summary-generation-from
Repo
Framework

Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

Title Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Authors Samuel Albanie, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
Abstract Obtaining large, human labelled speech datasets to train models for emotion recognition is a notoriously challenging task, hindered by annotation cost and label ambiguity. In this work, we consider the task of learning embeddings for speech classification without access to any form of labelled audio. We base our approach on a simple hypothesis: that the emotional content of speech correlates with the facial expression of the speaker. By exploiting this relationship, we show that annotations of expression can be transferred from the visual domain (faces) to the speech domain (voices) through cross-modal distillation. We make the following contributions: (i) we develop a strong teacher network for facial emotion recognition that achieves the state of the art on a standard benchmark; (ii) we use the teacher to train a student, tabula rasa, to learn representations (embeddings) for speech emotion recognition without access to labelled audio data; and (iii) we show that the speech emotion embedding can be used for speech emotion recognition on external benchmark datasets. Code, models and data are available.
Tasks Emotion Recognition, Speech Emotion Recognition
Published 2018-08-16
URL http://arxiv.org/abs/1808.05561v1
PDF http://arxiv.org/pdf/1808.05561v1.pdf
PWC https://paperswithcode.com/paper/emotion-recognition-in-speech-using-cross
Repo
Framework
comments powered by Disqus