January 27, 2020

3335 words 16 mins read

Paper Group ANR 1130

Memory Integrity of CNNs for Cross-Dataset Facial Expression Recognition. Few-Shot Bayesian Imitation Learning with Logical Program Policies. Extracting urban water by combining deep learning and Google Earth Engine. Disentangling Style and Content in Anime Illustrations. Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Quest …

Memory Integrity of CNNs for Cross-Dataset Facial Expression Recognition


Title	Memory Integrity of CNNs for Cross-Dataset Facial Expression Recognition
Authors	Dylan C. Tannugi, Alceu S. Britto Jr., Alessandro L. Koerich
Abstract	Facial expression recognition is a major problem in the domain of artificial intelligence. One of the best ways to solve this problem is the use of convolutional neural networks (CNNs). However, a large amount of data is required to train properly these networks but most of the datasets available for facial expression recognition are relatively small. A common way to circumvent the lack of data is to use CNNs trained on large datasets of different domains and fine-tuning the layers of such networks to the target domain. However, the fine-tuning process does not preserve the memory integrity as CNNs have the tendency to forget patterns they have learned. In this paper, we evaluate different strategies of fine-tuning a CNN with the aim of assessing the memory integrity of such strategies in a cross-dataset scenario. A CNN pre-trained on a source dataset is used as the baseline and four adaptation strategies have been evaluated: fine-tuning its fully connected layers; fine-tuning its last convolutional layer and its fully connected layers; retraining the CNN on a target dataset; and the fusion of the source and target datasets and retraining the CNN. Experimental results on four datasets have shown that the fusion of the source and the target datasets provides the best trade-off between accuracy and memory integrity.
Tasks	Facial Expression Recognition
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12082v1
PDF	https://arxiv.org/pdf/1905.12082v1.pdf
PWC	https://paperswithcode.com/paper/memory-integrity-of-cnns-for-cross-dataset
Repo
Framework

Few-Shot Bayesian Imitation Learning with Logical Program Policies


Title	Few-Shot Bayesian Imitation Learning with Logical Program Policies
Authors	Tom Silver, Kelsey R. Allen, Alex K. Lew, Leslie Pack Kaelbling, Josh Tenenbaum
Abstract	Humans can learn many novel tasks from a very small number (1–5) of demonstrations, in stark contrast to the data requirements of nearly tabula rasa deep learning methods. We propose an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples. We represent policies as logical combinations of programs drawn from a domain-specific language (DSL), define a prior over policies with a probabilistic grammar, and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study five strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20–1,000x more data efficient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efficient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.
Tasks	Bayesian Inference, Imitation Learning
Published	2019-04-12
URL	https://arxiv.org/abs/1904.06317v2
PDF	https://arxiv.org/pdf/1904.06317v2.pdf
PWC	https://paperswithcode.com/paper/few-shot-bayesian-imitation-learning-with
Repo
Framework

Extracting urban water by combining deep learning and Google Earth Engine


Title	Extracting urban water by combining deep learning and Google Earth Engine
Authors	Y. D. Wang, Z. W. Li, C. Zeng, G. S. Xia, H. F. Shen
Abstract	Urban water is important for the urban ecosystem. Accurate and efficient detection of urban water with remote sensing data is of great significance for urban management and planning. In this paper, we proposed a new method to combine Google Earth Engine (GEE) with multiscale convolutional neural network (MSCNN) to extract urban water from Landsat images, which is summarized as offline training and online prediction (OTOP). That is, the training of MSCNN was completed offline, and the process of urban water extraction was implemented on GEE with the trained parameters of MSCNN. The OTOP can give full play to the respective advantages of GEE and CNN, and make the use of deep learning method on GEE more flexible. It can process available satellite images with high performance without data download and storage, and the overall performance of urban water extraction is also higher than that of the modified normalized difference water index (MNDWI) and random forest. The mean kappa, F1-score and intersection over union (IoU) of urban water extraction with the OTOP in Changchun, Wuhan, Kunming and Guangzhou reached 0.924, 0.930 and 0.869, respectively. The results of the extended validation in the other major cities of China also show that the OTOP is robust and can be used to extract different types of urban water, which benefits from the structural design and training of the MSCNN. Therefore, the OTOP is especially suitable for the study of large-scale and long-term urban water change detection in the background of urbanization.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10726v1
PDF	https://arxiv.org/pdf/1912.10726v1.pdf
PWC	https://paperswithcode.com/paper/extracting-urban-water-by-combining-deep
Repo
Framework

Disentangling Style and Content in Anime Illustrations


Title	Disentangling Style and Content in Anime Illustrations
Authors	Sitao Xiang, Hao Li
Abstract	Existing methods for AI-generated artworks still struggle with generating high-quality stylized content, where high-level semantics are preserved, or separating fine-grained styles from various artists. We propose a novel Generative Adversarial Disentanglement Network which can disentangle two complementary factors of variations when only one of them is labelled in general, and fully decompose complex anime illustrations into style and content in particular. Training such model is challenging, since given a style, various content data may exist but not the other way round. Our approach is divided into two stages, one that encodes an input image into a style independent content, and one based on a dual-conditional generator. We demonstrate the ability to generate high-fidelity anime portraits with a fixed content and a large variety of styles from over a thousand artists, and vice versa, using a single end-to-end network and with applications in style transfer. We show this unique capability as well as superior output to the current state-of-the-art.
Tasks	Style Transfer
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10742v3
PDF	https://arxiv.org/pdf/1905.10742v3.pdf
PWC	https://paperswithcode.com/paper/disentangling-style-and-content-in-anime
Repo
Framework


Title	Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering
Authors	Junyeong Kim, Minuk Ma, Kyungsu Kim, Sungjin Kim, Chang D. Yoo
Abstract	This paper proposes a method to gain extra supervision via multi-task learning for multi-modal video question answering. Multi-modal video question answering is an important task that aims at the joint understanding of vision and language. However, establishing large scale dataset for multi-modal video question answering is expensive and the existing benchmarks are relatively small to provide sufficient supervision. To overcome this challenge, this paper proposes a multi-task learning method which is composed of three main components: (1) multi-modal video question answering network that answers the question based on the both video and subtitle feature, (2) temporal retrieval network that predicts the time in the video clip where the question was generated from and (3) modality alignment network that solves metric learning problem to find correct association of video and subtitle modalities. By simultaneously solving related auxiliary tasks with hierarchically shared intermediate layers, the extra synergistic supervisions are provided. Motivated by curriculum learning, multi task ratio scheduling is proposed to learn easier task earlier to set inductive bias at the beginning of the training. The experiments on publicly available dataset TVQA shows state-of-the-art results, and ablation studies are conducted to prove the statistical validity.
Tasks	Metric Learning, Multi-Task Learning, Question Answering, Video Question Answering
Published	2019-05-28
URL	https://arxiv.org/abs/1905.13540v1
PDF	https://arxiv.org/pdf/1905.13540v1.pdf
PWC	https://paperswithcode.com/paper/190513540
Repo
Framework

Compressed sensing reconstruction using Expectation Propagation


Title	Compressed sensing reconstruction using Expectation Propagation
Authors	Alfredo Braunstein, Anna Paola Muntoni, Andrea Pagnani, Mirko Pieropan
Abstract	Many interesting problems in fields ranging from telecommunications to computational biology can be formalized in terms of large underdetermined systems of linear equations with additional constraints or regularizers. One of the most studied ones, the Compressed Sensing problem (CS), consists in finding the solution with the smallest number of non-zero components of a given system of linear equations $\boldsymbol y = \mathbf{F} \boldsymbol{w}$ for known measurement vector $\boldsymbol{y}$ and sensing matrix $\mathbf{F}$. Here, we will address the compressed sensing problem within a Bayesian inference framework where the sparsity constraint is remapped into a singular prior distribution (called Spike-and-Slab or Bernoulli-Gauss). Solution to the problem is attempted through the computation of marginal distributions via Expectation Propagation (EP), an iterative computational scheme originally developed in Statistical Physics. We will show that this strategy is comparatively more accurate than the alternatives in solving instances of CS generated from statistically correlated measurement matrices. For computational strategies based on the Bayesian framework such as variants of Belief Propagation, this is to be expected, as they implicitly rely on the hypothesis of statistical independence among the entries of the sensing matrix. Perhaps surprisingly, the method outperforms uniformly also all the other state-of-the-art methods in our tests.
Tasks	Bayesian Inference
Published	2019-04-10
URL	https://arxiv.org/abs/1904.05777v2
PDF	https://arxiv.org/pdf/1904.05777v2.pdf
PWC	https://paperswithcode.com/paper/compressed-sensing-reconstruction-using
Repo
Framework

A $ν$- support vector quantile regression model with automatic accuracy control


Title	A $ν$- support vector quantile regression model with automatic accuracy control
Authors	Pritam Anand, Reshma Rastogi, Suresh Chandra
Abstract	This paper proposes a novel ‘$\nu$-support vector quantile regression’ ($\nu$-SVQR) model for the quantile estimation. It can facilitate the automatic control over accuracy by creating a suitable asymmetric $\epsilon$-insensitive zone according to the variance present in data. The proposed $\nu$-SVQR model uses the $\nu$ fraction of training data points for the estimation of the quantiles. In the $\nu$-SVQR model, training points asymptotically appear above and below of the asymmetric $\epsilon$-insensitive tube in the ratio of $1-\tau$ and $\tau$. Further, there are other interesting properties of the proposed $\nu$-SVQR model, which we have briefly described in this paper. These properties have been empirically verified using the artificial and real world dataset also.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09168v1
PDF	https://arxiv.org/pdf/1910.09168v1.pdf
PWC	https://paperswithcode.com/paper/a-support-vector-quantile-regression-model
Repo
Framework

Efficient Autonomy Validation in Simulation with Adaptive Stress Testing


Title	Efficient Autonomy Validation in Simulation with Adaptive Stress Testing
Authors	Mark Koren, Mykel Kochenderfer
Abstract	During the development of autonomous systems such as driverless cars, it is important to characterize the scenarios that are most likely to result in failure. Adaptive Stress Testing (AST) provides a way to search for the most-likely failure scenario as a Markov decision process (MDP). Our previous work used a deep reinforcement learning (DRL) solver to identify likely failure scenarios. However, the solver’s use of a feed-forward neural network with a discretized space of possible initial conditions poses two major problems. First, the system is not treated as a black box, in that it requires analyzing the internal state of the system, which leads to considerable implementation complexities. Second, in order to simulate realistic settings, a new instance of the solver needs to be run for each initial condition. Running a new solver for each initial condition not only significantly increases the computational complexity, but also disregards the underlying relationship between similar initial conditions. We provide a solution to both problems by employing a recurrent neural network that takes a set of initial conditions from a continuous space as input. This approach enables robust and efficient detection of failures because the solution generalizes across the entire space of initial conditions. By simulating an instance where an autonomous car drives while a pedestrian is crossing a road, we demonstrate the solver is now capable of finding solutions for problems that would have previously been intractable.
Tasks
Published	2019-07-16
URL	https://arxiv.org/abs/1907.06795v1
PDF	https://arxiv.org/pdf/1907.06795v1.pdf
PWC	https://paperswithcode.com/paper/efficient-autonomy-validation-in-simulation
Repo
Framework

The EntOptLayout Cytoscape plug-in for the efficient visualization of major protein complexes in protein-protein interaction and signalling networks


Title	The EntOptLayout Cytoscape plug-in for the efficient visualization of major protein complexes in protein-protein interaction and signalling networks
Authors	Bence Agg, Andrea Csaszar, Mate Szalay-Beko, Daniel V. Veres, Reka Mizsei, Peter Ferdinandy, Peter Csermely, Istvan A. Kovacs
Abstract	Motivation: Network visualizations of complex biological datasets usually result in ‘hairball’ images, which do not discriminate network modules. Results: We present the EntOptLayout Cytoscape plug-in based on a recently developed network representation theory. The plug-in provides an efficient visualization of network modules, which represent major protein complexes in protein-protein interaction and signalling networks. Importantly, the tool gives a quality score of the network visualization by calculating the information loss between the input data and the visual representation showing a 3- to 25-fold improvement over conventional methods. Availability and implementation: The plug-in (running on Windows, Linux, or Mac OS) and its tutorial (both in written and video forms) can be downloaded freely under the terms of the MIT license from: http://apps.cytoscape.org/apps/entoptlayout. Supplementary data are available at Bioinformatics online. Contact: csermely.peter@med.semmelweis-univ.hu
Tasks
Published	2019-04-08
URL	https://arxiv.org/abs/1904.03910v2
PDF	https://arxiv.org/pdf/1904.03910v2.pdf
PWC	https://paperswithcode.com/paper/the-entoptlayout-cytoscape-plug-in-for-the
Repo
Framework

On Polyhedral and Second-Order Cone Decompositions of Semidefinite Optimization Problems


Title	On Polyhedral and Second-Order Cone Decompositions of Semidefinite Optimization Problems
Authors	Dimitris Bertsimas, Ryan Cory-Wright
Abstract	We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method’s convergence, under a boundedness assumption. By relating the method’s rate of convergence to an initial outer approximation’s diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke the method to provide bound gaps of 0.5-6.5% for sparse PCA problems with $1000$s of covariates, and solve nuclear norm problems over 500x500 matrices.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03143v2
PDF	https://arxiv.org/pdf/1910.03143v2.pdf
PWC	https://paperswithcode.com/paper/on-polyhedral-and-second-order-cone
Repo
Framework


Title	A Blind Multiscale Spatial Regularization Framework for Kernel-based Spectral Unmixing
Authors	Ricardo Augusto Borsoi, Tales Imbiriba, José Carlos Moreira Bermudez, Cédric Richard
Abstract	Introducing spatial prior information in hyperspectral imaging (HSI) analysis has led to an overall improvement of the performance of many HSI methods applied for denoising, classification, and unmixing. Extending such methodologies to nonlinear settings is not always straightforward, specially for unmixing problems where the consideration of spatial relationships between neighboring pixels might comprise intricate interactions between their fractional abundances and nonlinear contributions. In this paper, we consider a multiscale regularization strategy for nonlinear spectral unmixing with kernels. The proposed methodology splits the unmixing problem into two sub-problems at two different spatial scales: a coarse scale containing low-dimensional structures, and the original fine scale. The coarse spatial domain is defined using superpixels that result from a multiscale transformation. Spectral unmixing is then formulated as the solution of quadratically constrained optimization problems, which are solved efficiently by exploring their strong duality and a reformulation of their dual cost functions in the form of root-finding problems. Furthermore, we employ a theory-based statistical framework to devise a consistent strategy to estimate all required parameters, including both the regularization parameters of the algorithm and the number of superpixels of the transformation, resulting in a truly blind (from the parameters setting perspective) unmixing method. Experimental results attest the superior performance of the proposed method when comparing with other, state-of-the-art, related strategies.
Tasks	Denoising
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06925v3
PDF	https://arxiv.org/pdf/1908.06925v3.pdf
PWC	https://paperswithcode.com/paper/a-blind-multiscale-spatial-regularization
Repo
Framework

Abstract categorial grammars with island constraints and effective decidability


Title	Abstract categorial grammars with island constraints and effective decidability
Authors	Sergey Slavnov
Abstract	A well-known approach to treating syntactic island constraints in the setting of Lambek grammars consists in adding specific bracket modalities to the logic. We adapt this approach to abstract categorial grammars (ACG). Thus we define bracketed (implicational) linear logic, bracketed lambda-calculus, and, eventually, bracketed ACG based on bracketed $\lambda$-calculus. This allows us modeling at least simplest island constraints, typically, in the context of relativization. Next we identify specific safely bracketed ACG which, just like ordinary (bracket-free) second order ACG generate effectively decidable languages, but are sufficiently flexible to model some higher order phenomena like relativization and correctly deal with syntactic islands, at least in simple toy examples.
Tasks
Published	2019-07-16
URL	https://arxiv.org/abs/1907.06950v1
PDF	https://arxiv.org/pdf/1907.06950v1.pdf
PWC	https://paperswithcode.com/paper/abstract-categorial-grammars-with-island
Repo
Framework

Pose-adaptive Hierarchical Attention Network for Facial Expression Recognition


Title	Pose-adaptive Hierarchical Attention Network for Facial Expression Recognition
Authors	Yuanyuan Liu, Jiyao Peng, Jiabei Zeng, Shiguang Shan
Abstract	Multi-view facial expression recognition (FER) is a challenging task because the appearance of an expression varies in poses. To alleviate the influences of poses, recent methods either perform pose normalization or learn separate FER classifiers for each pose. However, these methods usually have two stages and rely on good performance of pose estimators. Different from existing methods, we propose a pose-adaptive hierarchical attention network (PhaNet) that can jointly recognize the facial expressions and poses in unconstrained environment. Specifically, PhaNet discovers the most relevant regions to the facial expression by an attention mechanism in hierarchical scales, and the most informative scales are then selected to learn the pose-invariant and expression-discriminative representations. PhaNet is end-to-end trainable by minimizing the hierarchical attention losses, the FER loss and pose loss with dynamically learned loss weights. We validate the effectiveness of the proposed PhaNet on three multi-view datasets (BU-3DFE, Multi-pie, and KDEF) and two in-the-wild FER datasets (AffectNet and SFEW). Extensive experiments demonstrate that our framework outperforms the state-of-the-arts under both within-dataset and cross-dataset settings, achieving the average accuracies of 84.92%, 93.53%, 88.5%, 54.82% and 31.25% respectively.
Tasks	Facial Expression Recognition
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10059v1
PDF	https://arxiv.org/pdf/1905.10059v1.pdf
PWC	https://paperswithcode.com/paper/pose-adaptive-hierarchical-attention-network
Repo
Framework

Analyzing Verbal and Nonverbal Features for Predicting Group Performance


Title	Analyzing Verbal and Nonverbal Features for Predicting Group Performance
Authors	Uliyana Kubasova, Gabriel Murray, McKenzie Braley
Abstract	This work analyzes the efficacy of verbal and nonverbal features of group conversation for the task of automatic prediction of group task performance. We describe a new publicly available survival task dataset that was collected and annotated to facilitate this prediction task. In these experiments, the new dataset is merged with an existing survival task dataset, allowing us to compare feature sets on a much larger amount of data than has been used in recent related work. This work is also distinct from related research on social signal processing (SSP) in that we compare verbal and nonverbal features, whereas SSP is almost exclusively concerned with nonverbal aspects of social interaction. A key finding is that nonverbal features from the speech signal are extremely effective for this task, even on their own. However, the most effective individual features are verbal features, and we highlight the most important ones.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1907.01369v2
PDF	https://arxiv.org/pdf/1907.01369v2.pdf
PWC	https://paperswithcode.com/paper/analyzing-verbal-and-nonverbal-features-for
Repo
Framework

Towards Sampling from Nondirected Probabilistic Graphical models using a D-Wave Quantum Annealer


Title	Towards Sampling from Nondirected Probabilistic Graphical models using a D-Wave Quantum Annealer
Authors	Yaroslav Koshka, M. A. Novotny
Abstract	A D-Wave quantum annealer (QA) having a 2048 qubit lattice, with no missing qubits and couplings, allowed embedding of a complete graph of a Restricted Boltzmann Machine (RBM). A handwritten digit OptDigits data set having 8x7 pixels of visible units was used to train the RBM using a classical Contrastive Divergence. Embedding of the classically-trained RBM into the D-Wave lattice was used to demonstrate that the QA offers a high-efficiency alternative to the classical Markov Chain Monte Carlo (MCMC) for reconstructing missing labels of the test images as well as a generative model. At any training iteration, the D-Wave-based classification had classification error more than two times lower than MCMC. The main goal of this study was to investigate the quality of the sample from the RBM model distribution and its comparison to a classical MCMC sample. For the OptDigits dataset, the states in the D-Wave sample belonged to about two times more local valleys compared to the MCMC sample. All the lowest-energy (the highest joint probability) local minima in the MCMC sample were also found by the D-Wave. The D-Wave missed many of the higher-energy local valleys, while finding many “new” local valleys consistently missed by the MCMC. It was established that the “new” local valleys that the D-Wave finds are important for the model distribution in terms of the energy of the corresponding local minima, the width of the local valleys, and the height of the escape barrier.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00159v1
PDF	http://arxiv.org/pdf/1905.00159v1.pdf
PWC	https://paperswithcode.com/paper/towards-sampling-from-nondirected
Repo
Framework