October 17, 2019

3346 words 16 mins read

Paper Group ANR 741

Paper Group ANR 741

Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition. Non-Volume Preserving-based Feature Fusion Approach to Group-Level Expression Recognition on Crowd Videos. Towards a Continuous Knowledge Learning Engine for Chatbots. Natural Option Critic. Dynamics of Driver’s Gaze: Explorations in Behavior Modeling & Maneuver …

Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition

Title Two-level Attention with Two-stage Multi-task Learning for Facial Emotion Recognition
Authors Xiaohua Wang, Muzi Peng, Lijuan Pan, Min Hu, Chunhua Jin, Fuji Ren
Abstract Compared with facial emotion recognition on categorical model, the dimensional emotion recognition can describe numerous emotions of the real world more accurately. Most prior works of dimensional emotion estimation only considered laboratory data and used video, speech or other multi-modal features. The effect of these methods applied on static images in the real world is unknown. In this paper, a two-level attention with two-stage multi-task learning (2Att-2Mt) framework is proposed for facial emotion estimation on only static images. Firstly, the features of corresponding region(position-level features) are extracted and enhanced automatically by first-level attention mechanism. In the following, we utilize Bi-directional Recurrent Neural Network(Bi-RNN) with self-attention(second-level attention) to make full use of the relationship features of different layers(layer-level features) adaptively. Owing to the inherent complexity of dimensional emotion recognition, we propose a two-stage multi-task learning structure to exploited categorical representations to ameliorate the dimensional representations and estimate valence and arousal simultaneously in view of the correlation of the two targets. The quantitative results conducted on AffectNet dataset show significant advancement on Concordance Correlation Coefficient(CCC) and Root Mean Square Error(RMSE), illustrating the superiority of the proposed framework. Besides, extensive comparative experiments have also fully demonstrated the effectiveness of different components.
Tasks Emotion Recognition, Multi-Task Learning
Published 2018-11-29
URL http://arxiv.org/abs/1811.12139v1
PDF http://arxiv.org/pdf/1811.12139v1.pdf
PWC https://paperswithcode.com/paper/two-level-attention-with-two-stage-multi-task
Repo
Framework

Non-Volume Preserving-based Feature Fusion Approach to Group-Level Expression Recognition on Crowd Videos

Title Non-Volume Preserving-based Feature Fusion Approach to Group-Level Expression Recognition on Crowd Videos
Authors Kha Gia Quach, Ngan Le, Khoa Luu, Chi Nhan Duong, Ibsa Jalata, Karl Ricanek
Abstract Group-level emotion recognition (ER) is a growing research area as the demands for assessing crowds of all sizes is becoming an interest in both the security arena and social media. This work investigates group-level expression recognition on crowd videos where information is not only aggregated across a variable length sequence of frames but also over the set of faces within each frame to produce aggregated recognition results. In this paper, we propose an effective deep feature level fusion mechanism to model the spatial-temporal information in the crowd videos. Furthermore, we extend our proposed NVP fusion mechanism to temporal NVP fussion appoarch to learn the temporal information between frames. In order to demonstrate the robustness and effectiveness of each component in the proposed approach, three experiments were conducted: (i) evaluation on the AffectNet database to benchmark the proposed emoNet for recognizing facial expression; (ii) evaluation on EmotiW2018 to benchmark the proposed deep feature level fusion mechanism NVPF; and, (iii) examine the proposed TNVPF on an innovative Group-level Emotion on Crowd Videos (GECV) dataset composed of 627 videos collected from social media. GECV dataset is a collection of videos ranging in duration from 10 to 20 seconds of crowds of twenty (20) or more subjects and each video is labeled as positive, negative, or neutral.
Tasks Emotion Recognition
Published 2018-11-28
URL http://arxiv.org/abs/1811.11849v1
PDF http://arxiv.org/pdf/1811.11849v1.pdf
PWC https://paperswithcode.com/paper/non-volume-preserving-based-feature-fusion
Repo
Framework

Towards a Continuous Knowledge Learning Engine for Chatbots

Title Towards a Continuous Knowledge Learning Engine for Chatbots
Authors Sahisnu Mazumder, Nianzu Ma, Bing Liu
Abstract Although chatbots have been very popular in recent years, they still have some serious weaknesses which limit the scope of their applications. One major weakness is that they cannot learn new knowledge during the conversation process, i.e., their knowledge is fixed beforehand and cannot be expanded or updated during conversation. In this paper, we propose to build a general knowledge learning engine for chatbots to enable them to continuously and interactively learn new knowledge during conversations. As time goes by, they become more and more knowledgeable and better and better at learning and conversation. We model the task as an open-world knowledge base completion problem and propose a novel technique called lifelong interactive learning and inference (LiLi) to solve it. LiLi works by imitating how humans acquire knowledge and perform inference during an interactive conversation. Our experimental results show LiLi is highly promising.
Tasks Knowledge Base Completion
Published 2018-02-16
URL http://arxiv.org/abs/1802.06024v2
PDF http://arxiv.org/pdf/1802.06024v2.pdf
PWC https://paperswithcode.com/paper/towards-a-continuous-knowledge-learning
Repo
Framework

Natural Option Critic

Title Natural Option Critic
Authors Saket Tiwari, Philip S. Thomas
Abstract The recently proposed option-critic architecture Bacon et al. provide a stochastic policy gradient approach to hierarchical reinforcement learning. Specifically, they provide a way to estimate the gradient of the expected discounted return with respect to parameters that define a finite number of temporally extended actions, called \textit{options}. In this paper we show how the option-critic architecture can be extended to estimate the natural gradient of the expected discounted return. To this end, the central questions that we consider in this paper are: 1) what is the definition of the natural gradient in this context, 2) what is the Fisher information matrix associated with an option’s parameterized policy, 3) what is the Fisher information matrix associated with an option’s parameterized termination function, and 4) how can a compatible function approximation approach be leveraged to obtain natural gradient estimates for both the parameterized policy and parameterized termination functions of an option with per-time-step time and space complexity linear in the total number of parameters. Based on answers to these questions we introduce the natural option critic algorithm. Experimental results showcase improvement over the vanilla gradient approach.
Tasks Hierarchical Reinforcement Learning
Published 2018-12-04
URL http://arxiv.org/abs/1812.01488v1
PDF http://arxiv.org/pdf/1812.01488v1.pdf
PWC https://paperswithcode.com/paper/natural-option-critic
Repo
Framework

Dynamics of Driver’s Gaze: Explorations in Behavior Modeling & Maneuver Prediction

Title Dynamics of Driver’s Gaze: Explorations in Behavior Modeling & Maneuver Prediction
Authors Sujitha Martin, Sourabh Vora, Kevan Yuen, Mohan M. Trivedi
Abstract The study and modeling of driver’s gaze dynamics is important because, if and how the driver is monitoring the driving environment is vital for driver assistance in manual mode, for take-over requests in highly automated mode and for semantic perception of the surround in fully autonomous mode. We developed a machine vision based framework to classify driver’s gaze into context rich zones of interest and model driver’s gaze behavior by representing gaze dynamics over a time period using gaze accumulation, glance duration and glance frequencies. As a use case, we explore the driver’s gaze dynamic patterns during maneuvers executed in freeway driving, namely, left lane change maneuver, right lane change maneuver and lane keeping. It is shown that condensing gaze dynamics into durations and frequencies leads to recurring patterns based on driver activities. Furthermore, modeling these patterns show predictive powers in maneuver detection up to a few hundred milliseconds a priori.
Tasks
Published 2018-01-31
URL http://arxiv.org/abs/1802.00066v1
PDF http://arxiv.org/pdf/1802.00066v1.pdf
PWC https://paperswithcode.com/paper/dynamics-of-drivers-gaze-explorations-in
Repo
Framework

Are good local minima wide in sparse recovery?

Title Are good local minima wide in sparse recovery?
Authors Michael Moeller, Otmar Loffeld, Juergen Gall, Felix Krahmer
Abstract The idea of compressed sensing is to exploit representations in suitable (overcomplete) dictionaries that allow to recover signals far beyond the Nyquist rate provided that they admit a sparse representation in the respective dictionary. The latter gives rise to the sparse recovery problem of finding the best sparse linear approximation of given data in a given generating system. In this paper we analyze the iterative hard thresholding (IHT) algorithm as one of the most popular greedy methods for solving the sparse recovery problem, and demonstrate that systematically perturbing the IHT algorithm by adding noise to intermediate iterates yields improved results. Further improvements can be obtained by entirely rephrasing the problem as a parametric deep-learning-type of optimization problem. By introducing perturbations via dropout, we demonstrate to significantly outperform the classical IHT algorithm, obtaining $3$ to $6$ times lower average objective errors.
Tasks
Published 2018-06-21
URL http://arxiv.org/abs/1806.08296v1
PDF http://arxiv.org/pdf/1806.08296v1.pdf
PWC https://paperswithcode.com/paper/are-good-local-minima-wide-in-sparse-recovery
Repo
Framework

iQIYI-VID: A Large Dataset for Multi-modal Person Identification

Title iQIYI-VID: A Large Dataset for Multi-modal Person Identification
Authors Yuanliu Liu, Bo Peng, Peipei Shi, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, Tingwei Gao, Ganwen Wang, Jian Liu, Xiangju Lu, Danming Xie
Abstract Person identification in the wild is very challenging due to great variation in poses, face quality, clothes, makeup and so on. Traditional research, such as face recognition, person re-identification, and speaker recognition, often focuses on a single modal of information, which is inadequate to handle all the situations in practice. Multi-modal person identification is a more promising way that we can jointly utilize face, head, body, audio features, and so on. In this paper, we introduce iQIYI-VID, the largest video dataset for multi-modal person identification. It is composed of 600K video clips of 5,000 celebrities. These video clips are extracted from 400K hours of online videos of various types, ranging from movies, variety shows, TV series, to news broadcasting. All video clips pass through a careful human annotation process, and the error rate of labels is lower than 0.2%. We evaluated the state-of-art models of face recognition, person re-identification, and speaker recognition on the iQIYI-VID dataset. Experimental results show that these models are still far from being perfect for the task of person identification in the wild. We proposed a Multi-modal Attention module to fuse multi-modal features that can improve person identification considerably. We have released the dataset online to promote multi-modal person identification research.
Tasks Face Recognition, Multi-Modal Person Identification, Person Identification, Person Re-Identification, Speaker Recognition
Published 2018-11-19
URL http://arxiv.org/abs/1811.07548v2
PDF http://arxiv.org/pdf/1811.07548v2.pdf
PWC https://paperswithcode.com/paper/iqiyi-vid-a-large-dataset-for-multi-modal
Repo
Framework

Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation

Title Representing a Partially Observed Non-Rigid 3D Human Using Eigen-Texture and Eigen-Deformation
Authors Ryosuke Kimura, Akihiko Sayo, Fabian Lorenzo Dayrit, Yuta Nakashima, Hiroshi Kawasaki, Ambrosio Blanco, Katsushi Ikeuchi
Abstract Reconstruction of the shape and motion of humans from RGB-D is a challenging problem, receiving much attention in recent years. Recent approaches for full-body reconstruction use a statistic shape model, which is built upon accurate full-body scans of people in skin-tight clothes, to complete invisible parts due to occlusion. Such a statistic model may still be fit to an RGB-D measurement with loose clothes but cannot describe its deformations, such as clothing wrinkles. Observed surfaces may be reconstructed precisely from actual measurements, while we have no cues for unobserved surfaces. For full-body reconstruction with loose clothes, we propose to use lower dimensional embeddings of texture and deformation referred to as eigen-texturing and eigen-deformation, to reproduce views of even unobserved surfaces. Provided a full-body reconstruction from a sequence of partial measurements as 3D meshes, the texture and deformation of each triangle are then embedded using eigen-decomposition. Combined with neural-network-based coefficient regression, our method synthesizes the texture and deformation from arbitrary viewpoints. We evaluate our method using simulated data and visually demonstrate how our method works on real data.
Tasks
Published 2018-07-07
URL http://arxiv.org/abs/1807.02632v1
PDF http://arxiv.org/pdf/1807.02632v1.pdf
PWC https://paperswithcode.com/paper/representing-a-partially-observed-non-rigid
Repo
Framework

Generating Contradictory, Neutral, and Entailing Sentences

Title Generating Contradictory, Neutral, and Entailing Sentences
Authors Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville
Abstract Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to learn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.
Tasks Natural Language Inference
Published 2018-03-07
URL http://arxiv.org/abs/1803.02710v1
PDF http://arxiv.org/pdf/1803.02710v1.pdf
PWC https://paperswithcode.com/paper/generating-contradictory-neutral-and
Repo
Framework

TROVE Feature Detection for Online Pose Recovery by Binocular Cameras

Title TROVE Feature Detection for Online Pose Recovery by Binocular Cameras
Authors Yuance Liu, Michael Z. Q. Chen
Abstract This paper proposes a new and efficient method to estimate 6-DoF ego-states: attitudes and positions in real time. The proposed method extract information of ego-states by observing a feature called “TROVE” (Three Rays and One VErtex). TROVE features are projected from structures that are ubiquitous on man-made constructions and objects. The proposed method does not search for conventional corner-type features nor use Perspective-n-Point (PnP) methods, and it achieves a real-time estimation of attitudes and positions up to 60 Hz. The accuracy of attitude estimates can reach 0.3 degrees and that of position estimates can reach 2 cm in an indoor environment. The result shows a promising approach for unmanned robots to localize in an environment that is rich in man-made structures.
Tasks
Published 2018-12-28
URL http://arxiv.org/abs/1812.10967v1
PDF http://arxiv.org/pdf/1812.10967v1.pdf
PWC https://paperswithcode.com/paper/trove-feature-detection-for-online-pose
Repo
Framework

Towards Explainable Deep Learning for Credit Lending: A Case Study

Title Towards Explainable Deep Learning for Credit Lending: A Case Study
Authors Ceena Modarres, Mark Ibrahim, Melissa Louie, John Paisley
Abstract Deep learning adoption in the financial services industry has been limited due to a lack of model interpretability. However, several techniques have been proposed to explain predictions made by a neural network. We provide an initial investigation into these techniques for the assessment of credit risk with neural networks.
Tasks
Published 2018-11-15
URL http://arxiv.org/abs/1811.06471v2
PDF http://arxiv.org/pdf/1811.06471v2.pdf
PWC https://paperswithcode.com/paper/towards-explainable-deep-learning-for-credit
Repo
Framework

Integrating Project Spatial Coordinates into Pavement Management Prioritization

Title Integrating Project Spatial Coordinates into Pavement Management Prioritization
Authors Omar Elbagalati, Mustafa Hajij
Abstract To date, pavement management software products and studies on optimizing the prioritization of pavement maintenance and rehabilitation (M&R) have been mainly focused on three parameters; the pre-treatment pavement condition, the rehabilitation cost, and the available budget. Yet, the role of the candidate projects’ spatial characteristics in the decision-making process has not been deeply considered. Such a limitation, predominately, allows the recommended M&R projects’ schedule to involve simultaneously running but spatially scattered construction sites, which are very challenging to monitor and manage. This study introduces a novel approach to integrate pavement segments’ spatial coordinates into the M&R prioritization analysis. The introduced approach aims at combining the pavement segments with converged spatial coordinates to be repaired in the same timeframe without compromising the allocated budget levels or the overall target Pavement Condition Index (PCI). Such a combination would result in minimizing the routing of crews, materials and other equipment among the construction sites and would provide better collaborations and communications between the pavement maintenance teams. Proposed herein is a novel spatial clustering algorithm that automatically finds the projects within a certain budget and spatial constrains. The developed algorithm was successfully validated using 1,800 pavement maintenance projects from two real-life examples of the City of Milton, GA and the City of Tyler, TX.
Tasks Decision Making
Published 2018-11-05
URL http://arxiv.org/abs/1811.03437v1
PDF http://arxiv.org/pdf/1811.03437v1.pdf
PWC https://paperswithcode.com/paper/integrating-project-spatial-coordinates-into
Repo
Framework

Patient-Specific 3D Volumetric Reconstruction of Bioresorbable Stents: A Method to Generate 3D Geometries for Computational Analysis of Coronaries Treated with Bioresorbable Stents

Title Patient-Specific 3D Volumetric Reconstruction of Bioresorbable Stents: A Method to Generate 3D Geometries for Computational Analysis of Coronaries Treated with Bioresorbable Stents
Authors Boyi Yang, Marina Piccinelli, Gaetano Esposito, Tianli Han, Yasir Bouchi, Bill Gogas, Don Giddens, Habib Samady, Alessandro Veneziani
Abstract As experts continue to debate the optimal surgery practice for coronary disease - percutaneous coronary intervention (PCI) or coronary aortic bypass graft (CABG) - computational tools may provide a quantitative assessment of each option. Computational fluid dynamics (CFD) has been used to assess the interplay between hemodynamics and stent struts; it is of particular interest in Bioresorbable Vascular Stents (BVS), since their thicker struts may result in impacted flow patterns and possible pathological consequences. Many proofs of concept are presented in the literature; however, a practical method for extracting patient-specific stented coronary artery geometries from images over a large number of patients remains an open problem. This work provides a possible pipeline for the reconstruction of the BVS. Using Optical Coherence Tomographies (OCT) and Invasive Coronary Angiographies (ICA), we can reconstruct the 3D geometry of deployed BVS in vivo. We illustrate the stent reconstruction process: (i) automatic strut detection, (ii) identification of stent components, (iii) 3D registration of stent curvature, and (iv) final stent volume reconstruction. The methodology is designed for use on clinical OCT images, as opposed to approaches that relied on a small number of virtually deployed stents. The proposed reconstruction process is validated with a virtual phantom stent, providing quantitative assessment of the methodology, and with selected clinical cases, confirming feasibility. Using multimodality image analysis, we obtain reliable reconstructions within a reasonable timeframe. This work is the first step toward a fully automated reconstruction and simulation procedure aiming at an extensive quantitative analysis of the impact of BVS struts on hemodynamics via CFD in clinical trials, going beyond the proof-of-concept stage.
Tasks 3D Volumetric Reconstruction
Published 2018-10-08
URL http://arxiv.org/abs/1810.03270v1
PDF http://arxiv.org/pdf/1810.03270v1.pdf
PWC https://paperswithcode.com/paper/patient-specific-3d-volumetric-reconstruction
Repo
Framework

Decentralized Cooperative Stochastic Bandits

Title Decentralized Cooperative Stochastic Bandits
Authors David Martínez-Rubio, Varun Kanade, Patrick Rebeschini
Abstract We study a decentralized cooperative stochastic multi-armed bandit problem with $K$ arms on a network of $N$ agents. In our model, the reward distribution of each arm is the same for each agent and rewards are drawn independently across agents and time steps. In each round, each agent chooses an arm to play and subsequently sends a message to her neighbors. The goal is to minimize the overall regret of the entire network. We design a fully decentralized algorithm that uses an accelerated consensus procedure to compute (delayed) estimates of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound (UCB) algorithm that accounts for the delay and error of the estimates. We analyze the regret of our algorithm and also provide a lower bound. The regret is bounded by the optimal centralized regret plus a natural and simple term depending on the spectral gap of the communication matrix. Our algorithm is simpler to analyze than those proposed in prior work and it achieves better regret bounds, while requiring less information about the underlying network. It also performs better empirically.
Tasks Multi-Armed Bandits
Published 2018-10-10
URL https://arxiv.org/abs/1810.04468v2
PDF https://arxiv.org/pdf/1810.04468v2.pdf
PWC https://paperswithcode.com/paper/decentralized-cooperative-stochastic-multi
Repo
Framework

Feature selection with optimal coordinate ascent (OCA)

Title Feature selection with optimal coordinate ascent (OCA)
Authors David Saltiel, Eric Benhamou
Abstract In machine learning, Feature Selection (FS) is a major part of efficient algorithm. It fuels the algorithm and is the starting block for our prediction. In this paper, we present a new method, called Optimal Coordinate Ascent (OCA) that allows us selecting features among block and individual features. OCA relies on coordinate ascent to find an optimal solution for gradient boosting methods score (number of correctly classified samples). OCA takes into account the notion of dependencies between variables forming blocks in our optimization. The coordinate ascent optimization solves the issue of the NP hard original problem where the number of combinations rapidly explode making a grid search unfeasible. It reduces considerably the number of iterations changing this NP hard problem into a polynomial search one. OCA brings substantial differences and improvements compared to previous coordinate ascent feature selection method: we group variables into block and individual variables instead of a binary selection. Our initial guess is based on the k-best group variables making our initial point more robust. We also introduced new stopping criteria making our optimization faster. We compare these two methods on our data set. We found that our method outperforms the initial one. We also compare our method to the Recursive Feature Elimination (RFE) method and find that OCA leads to the minimum feature set with the highest score. This is a nice byproduct of our method as it provides empirically the most compact data set with optimal performance.
Tasks Feature Selection
Published 2018-11-29
URL http://arxiv.org/abs/1811.12064v3
PDF http://arxiv.org/pdf/1811.12064v3.pdf
PWC https://paperswithcode.com/paper/feature-selection-with-optimal-coordinate
Repo
Framework
comments powered by Disqus