October 21, 2019

3038 words 15 mins read

Paper Group AWR 60

Paper Group AWR 60

NeuralREG: An end-to-end approach to referring expression generation. Generating Easy-to-Understand Referring Expressions for Target Identifications. Reparameterization Gradient for Non-differentiable Models. Convolutional Social Pooling for Vehicle Trajectory Prediction. CSV: Image Quality Assessment Based on Color, Structure, and Visual System. L …

NeuralREG: An end-to-end approach to referring expression generation

Title NeuralREG: An end-to-end approach to referring expression generation
Authors Thiago Castro Ferreira, Diego Moussallem, Ákos Kádár, Sander Wubben, Emiel Krahmer
Abstract Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction. Using a delexicalized version of the WebNLG corpus, we show that the neural model substantially improves over two strong baselines. Data and models are publicly available.
Tasks
Published 2018-05-21
URL http://arxiv.org/abs/1805.08093v1
PDF http://arxiv.org/pdf/1805.08093v1.pdf
PWC https://paperswithcode.com/paper/neuralreg-an-end-to-end-approach-to-referring
Repo https://github.com/ThiagoCF05/NeuralREG
Framework none

Generating Easy-to-Understand Referring Expressions for Target Identifications

Title Generating Easy-to-Understand Referring Expressions for Target Identifications
Authors Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada
Abstract This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. As a target becomes relatively less salient, identifying referred objects itself becomes more difficult. However, the existing studies regarded all sentences that refer to objects correctly as equally good, ignoring whether they are easily understood by humans. If the target is not salient, humans utilize relationships with the salient contexts around it to help listeners to comprehend it better. To derive this information from human annotations, our model is designed to extract information from the target and from the environment. Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans. We optimized this by using the time required to locate the referred objects by humans and their accuracies. To evaluate our system, we created a new referring expression dataset whose images were acquired from Grand Theft Auto V (GTA V), limiting targets to persons. Experimental results show the effectiveness of our approach. Our code and dataset are available at https://github.com/mikittt/easy-to-understand-REG.
Tasks
Published 2018-11-29
URL https://arxiv.org/abs/1811.12104v4
PDF https://arxiv.org/pdf/1811.12104v4.pdf
PWC https://paperswithcode.com/paper/towards-human-friendly-referring-expression
Repo https://github.com/mikittt/easy-to-understand-REG
Framework none

Reparameterization Gradient for Non-differentiable Models

Title Reparameterization Gradient for Non-differentiable Models
Authors Wonyeol Lee, Hangyeol Yu, Hongseok Yang
Abstract We present a new algorithm for stochastic variational inference that targets at models with non-differentiable densities. One of the key challenges in stochastic variational inference is to come up with a low-variance estimator of the gradient of a variational objective. We tackle the challenge by generalizing the reparameterization trick, one of the most effective techniques for addressing the variance issue for differentiable models, so that the trick works for non-differentiable models as well. Our algorithm splits the space of latent variables into regions where the density of the variables is differentiable, and their boundaries where the density may fail to be differentiable. For each differentiable region, the algorithm applies the standard reparameterization trick and estimates the gradient restricted to the region. For each potentially non-differentiable boundary, it uses a form of manifold sampling and computes the direction for variational parameters that, if followed, would increase the boundary’s contribution to the variational objective. The sum of all the estimates becomes the gradient estimate of our algorithm. Our estimator enjoys the reduced variance of the reparameterization gradient while remaining unbiased even for non-differentiable models. The experiments with our preliminary implementation confirm the benefit of reduced variance and unbiasedness.
Tasks
Published 2018-06-01
URL http://arxiv.org/abs/1806.00176v2
PDF http://arxiv.org/pdf/1806.00176v2.pdf
PWC https://paperswithcode.com/paper/reparameterization-gradient-for-non
Repo https://github.com/wonyeol/reparam-nondiff
Framework none

Convolutional Social Pooling for Vehicle Trajectory Prediction

Title Convolutional Social Pooling for Vehicle Trajectory Prediction
Authors Nachiket Deo, Mohan M. Trivedi
Abstract Forecasting the motion of surrounding vehicles is a critical ability for an autonomous vehicle deployed in complex traffic. Motion of all vehicles in a scene is governed by the traffic context, i.e., the motion and relative spatial configuration of neighboring vehicles. In this paper we propose an LSTM encoder-decoder model that uses convolutional social pooling as an improvement to social pooling layers for robustly learning interdependencies in vehicle motion. Additionally, our model outputs a multi-modal predictive distribution over future trajectories based on maneuver classes. We evaluate our model using the publicly available NGSIM US-101 and I-80 datasets. Our results show improvement over the state of the art in terms of RMS values of prediction error and negative log-likelihoods of true future trajectories under the model’s predictive distribution. We also present a qualitative analysis of the model’s predicted distributions for various traffic scenarios.
Tasks Trajectory Prediction
Published 2018-05-15
URL http://arxiv.org/abs/1805.06771v1
PDF http://arxiv.org/pdf/1805.06771v1.pdf
PWC https://paperswithcode.com/paper/convolutional-social-pooling-for-vehicle
Repo https://github.com/christian-rncl/TrackNPred
Framework pytorch

CSV: Image Quality Assessment Based on Color, Structure, and Visual System

Title CSV: Image Quality Assessment Based on Color, Structure, and Visual System
Authors Dogancan Temel, Ghassan AlRegib
Abstract This paper presents a full-reference image quality estimator based on color, structure, and visual system characteristics denoted as CSV. In contrast to the majority of existing methods, we quantify perceptual color degradations rather than absolute pixel-wise changes. We use the CIEDE2000 color difference formulation to quantify low-level color degradations and the Earth Mover’s Distance between color name descriptors to measure significant color degradations. In addition to the perceptual color difference, CSV also contains structural and perceptual differences. Structural feature maps are obtained by mean subtraction and divisive normalization, and perceptual feature maps are obtained from contrast sensitivity formulations of retinal ganglion cells. The proposed quality estimator CSV is tested on the LIVE, the Multiply Distorted LIVE, and the TID 2013 databases, and it is always among the top two performing quality estimators in terms of at least ranking, monotonic behavior or linearity.
Tasks Image Quality Assessment
Published 2018-10-15
URL http://arxiv.org/abs/1810.06464v2
PDF http://arxiv.org/pdf/1810.06464v2.pdf
PWC https://paperswithcode.com/paper/csv-image-quality-assessment-based-on-color
Repo https://github.com/olivesgatech/CSV
Framework none

LATE Ain’T Earley: A Faster Parallel Earley Parser

Title LATE Ain’T Earley: A Faster Parallel Earley Parser
Authors Peter Ahrens, John Feser, Robin Hui
Abstract We present the LATE algorithm, an asynchronous variant of the Earley algorithm for parsing context-free grammars. The Earley algorithm is naturally task-based, but is difficult to parallelize because of dependencies between the tasks. We present the LATE algorithm, which uses additional data structures to maintain information about the state of the parse so that work items may be processed in any order. This property allows the LATE algorithm to be sped up using task parallelism. We show that the LATE algorithm can achieve a 120x speedup over the Earley algorithm on a natural language task.
Tasks
Published 2018-07-16
URL http://arxiv.org/abs/1807.05642v1
PDF http://arxiv.org/pdf/1807.05642v1.pdf
PWC https://paperswithcode.com/paper/late-aint-earley-a-faster-parallel-earley
Repo https://github.com/jfeser/earley
Framework none

Cross validation residuals for generalised least squares and other correlated data models

Title Cross validation residuals for generalised least squares and other correlated data models
Authors Ingrid Annette Baade
Abstract Cross validation residuals are well known for the ordinary least squares model. Here leave-M-out cross validation is extended to generalised least squares. The relationship between cross validation residuals and Cook’s distance is demonstrated, in terms of an approximation to the difference in the generalised residual sum of squares for a model fit to all the data (training and test) and a model fit to a reduced dataset (training data only). For generalised least squares, as for ordinary least squares, there is no need to refit the model to reduced size datasets as all the values for K fold cross validation are available after fitting the model to all the data.
Tasks
Published 2018-09-05
URL http://arxiv.org/abs/1809.01319v1
PDF http://arxiv.org/pdf/1809.01319v1.pdf
PWC https://paperswithcode.com/paper/cross-validation-residuals-for-generalised
Repo https://github.com/ibaade/CV-for-GLS
Framework none

Scalable Laplacian K-modes

Title Scalable Laplacian K-modes
Authors Imtiaz Masud Ziko, Eric Granger, Ismail Ben Ayed
Abstract We advocate Laplacian K-modes for joint clustering and density mode finding, and propose a concave-convex relaxation of the problem, which yields a parallel algorithm that scales up to large datasets and high dimensions. We optimize a tight bound (auxiliary function) of our relaxation, which, at each iteration, amounts to computing an independent update for each cluster-assignment variable, with guaranteed convergence. Therefore, our bound optimizer can be trivially distributed for large-scale data sets. Furthermore, we show that the density modes can be obtained as byproducts of the assignment variables via simple maximum-value operations whose additional computational cost is linear in the number of data points. Our formulation does not need storing a full affinity matrix and computing its eigenvalue decomposition, neither does it perform expensive projection steps and Lagrangian-dual inner iterates for the simplex constraints of each point. Furthermore, unlike mean-shift, our density-mode estimation does not require inner-loop gradient-ascent iterates. It has a complexity independent of feature-space dimension, yields modes that are valid data points in the input set and is applicable to discrete domains as well as arbitrary kernels. We report comprehensive experiments over various data sets, which show that our algorithm yields very competitive performances in term of optimization quality (i.e., the value of the discrete-variable objective at convergence) and clustering accuracy.
Tasks
Published 2018-10-31
URL http://arxiv.org/abs/1810.13044v2
PDF http://arxiv.org/pdf/1810.13044v2.pdf
PWC https://paperswithcode.com/paper/scalable-laplacian-k-modes
Repo https://github.com/imtiazziko/SLK
Framework none

Deep Learning Methods for Reynolds-Averaged Navier-Stokes Simulations of Airfoil Flows

Title Deep Learning Methods for Reynolds-Averaged Navier-Stokes Simulations of Airfoil Flows
Authors Nils Thuerey, Konstantin Weissenow, Lukas Prantl, Xiangyu Hu
Abstract With this study we investigate the accuracy of deep learning models for the inference of Reynolds-Averaged Navier-Stokes solutions. We focus on a modernized U-net architecture, and evaluate a large number of trained neural networks with respect to their accuracy for the calculation of pressure and velocity distributions. In particular, we illustrate how training data size and the number of weights influence the accuracy of the solutions. With our best models we arrive at a mean relative pressure and velocity error of less than 3% across a range of previously unseen airfoil shapes. In addition all source code is publicly available in order to ensure reproducibility and to provide a starting point for researchers interested in deep learning methods for physics problems. While this work focuses on RANS solutions, the neural network architecture and learning setup are very generic, and applicable to a wide range of PDE boundary value problems on Cartesian grids.
Tasks
Published 2018-10-18
URL http://arxiv.org/abs/1810.08217v2
PDF http://arxiv.org/pdf/1810.08217v2.pdf
PWC https://paperswithcode.com/paper/deep-learning-methods-for-reynolds-averaged
Repo https://github.com/thunil/Deep-Flow-Prediction
Framework pytorch

A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics

Title A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics
Authors Martin Gerlach, Francesc Font-Clos
Abstract The use of Project Gutenberg (PG) as a text corpus has been extremely popular in statistical analysis of language for more than 25 years. However, in contrast to other major linguistic datasets of similar importance, no consensual full version of PG exists to date. In fact, most PG studies so far either consider only a small number of manually selected books, leading to potential biased subsets, or employ vastly different pre-processing strategies (often specified in insufficient details), raising concerns regarding the reproducibility of published results. In order to address these shortcomings, here we present the Standardized Project Gutenberg Corpus (SPGC), an open science approach to a curated version of the complete PG data containing more than 50,000 books and more than $3 \times 10^9$ word-tokens. Using different sources of annotated metadata, we not only provide a broad characterization of the content of PG, but also show different examples highlighting the potential of SPGC for investigating language variability across time, subjects, and authors. We publish our methodology in detail, the code to download and process the data, as well as the obtained corpus itself on 3 different levels of granularity (raw text, timeseries of word tokens, and counts of words). In this way, we provide a reproducible, pre-processed, full-size version of Project Gutenberg as a new scientific resource for corpus linguistics, natural language processing, and information retrieval.
Tasks Information Retrieval
Published 2018-12-19
URL http://arxiv.org/abs/1812.08092v1
PDF http://arxiv.org/pdf/1812.08092v1.pdf
PWC https://paperswithcode.com/paper/a-standardized-project-gutenberg-corpus-for
Repo https://github.com/pgcorpus/gutenberg
Framework none

Single Image Reflection Separation with Perceptual Losses

Title Single Image Reflection Separation with Perceptual Losses
Authors Xuaner Zhang, Ren Ng, Qifeng Chen
Abstract We present an approach to separating reflection from a single image. The approach uses a fully convolutional network trained end-to-end with losses that exploit low-level and high-level image information. Our loss function includes two perceptual losses: a feature loss from a visual perception network, and an adversarial loss that encodes characteristics of images in the transmission layers. We also propose a novel exclusion loss that enforces pixel-level layer separation. We create a dataset of real-world images with reflection and corresponding ground-truth transmission layers for quantitative evaluation and model training. We validate our method through comprehensive quantitative experiments and show that our approach outperforms state-of-the-art reflection removal methods in PSNR, SSIM, and perceptual user study. We also extend our method to two other image enhancement tasks to demonstrate the generality of our approach.
Tasks Image Enhancement
Published 2018-06-14
URL http://arxiv.org/abs/1806.05376v1
PDF http://arxiv.org/pdf/1806.05376v1.pdf
PWC https://paperswithcode.com/paper/single-image-reflection-separation-with
Repo https://github.com/vinthony/ghost-free-shadow-removal
Framework tf

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

Title ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder
Authors Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
Abstract This paper proposes a non-parallel many-to-many voice conversion (VC) method using a variant of the conditional variational autoencoder (VAE) called an auxiliary classifier VAE (ACVAE). The proposed method has three key features. First, it adopts fully convolutional architectures to construct the encoder and decoder networks so that the networks can learn conversion rules that capture time dependencies in the acoustic feature sequences of source and target speech. Second, it uses an information-theoretic regularization for the model training to ensure that the information in the attribute class label will not be lost in the conversion process. With regular CVAEs, the encoder and decoder are free to ignore the attribute class label input. This can be problematic since in such a situation, the attribute class label will have little effect on controlling the voice characteristics of input speech at test time. Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier. Third, it avoids producing buzzy-sounding speech at test time by simply transplanting the spectral details of the input speech into its converted version. Subjective evaluation experiments revealed that this simple method worked reasonably well in a non-parallel many-to-many speaker identity conversion task.
Tasks Voice Conversion
Published 2018-08-13
URL http://arxiv.org/abs/1808.05092v2
PDF http://arxiv.org/pdf/1808.05092v2.pdf
PWC https://paperswithcode.com/paper/acvae-vc-non-parallel-many-to-many-voice
Repo https://github.com/aoixcat/ACVAE-VC
Framework pytorch

MAMNet: Multi-path Adaptive Modulation Network for Image Super-Resolution

Title MAMNet: Multi-path Adaptive Modulation Network for Image Super-Resolution
Authors Jun-Hyuk Kim, Jun-Ho Choi, Manri Cheon, Jong-Seok Lee
Abstract In recent years, single image super-resolution (SR) methods based on deep convolutional neural networks (CNNs) have made significant progress. However, due to the non-adaptive nature of the convolution operation, they cannot adapt to various characteristics of images, which limits their representational capability and, consequently, results in unnecessarily large model sizes. To address this issue, we propose a novel multi-path adaptive modulation network (MAMNet). Specifically, we propose a multi-path adaptive modulation block (MAMB), which is a lightweight yet effective residual block that adaptively modulates residual feature responses by fully exploiting their information via three paths. The three paths model three types of information suitable for SR: 1) channel-specific information (CSI) using global variance pooling, 2) inter-channel dependencies (ICD) based on the CSI, 3) and channel-specific spatial dependencies (CSD) via depth-wise convolution. We demonstrate that the proposed MAMB is effective and parameter-efficient for image SR than other feature modulation methods. In addition, experimental results show that our MAMNet outperforms most of the state-of-the-art methods with a relatively small number of parameters.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-11-29
URL https://arxiv.org/abs/1811.12043v2
PDF https://arxiv.org/pdf/1811.12043v2.pdf
PWC https://paperswithcode.com/paper/ram-residual-attention-module-for-single
Repo https://github.com/S-aiueo32/SRRAM
Framework tf

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

Title Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks
Authors Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi
Abstract Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools from sequence prediction and generative adversarial networks: a recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people. We predict socially plausible futures by training adversarially against a recurrent discriminator, and encourage diverse predictions with a novel variety loss. Through experiments on several datasets we demonstrate that our approach outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity.
Tasks Self-Driving Cars, Trajectory Prediction
Published 2018-03-29
URL http://arxiv.org/abs/1803.10892v1
PDF http://arxiv.org/pdf/1803.10892v1.pdf
PWC https://paperswithcode.com/paper/social-gan-socially-acceptable-trajectories
Repo https://github.com/christian-rncl/TrackNPred
Framework pytorch

Semi-Supervised Learning for Face Sketch Synthesis in the Wild

Title Semi-Supervised Learning for Face Sketch Synthesis in the Wild
Authors Chaofeng Chen, Wei Liu, Xiao Tan, Kwan-Yee K. Wong
Abstract Face sketch synthesis has made great progress in the past few years. Recent methods based on deep neural networks are able to generate high quality sketches from face photos. However, due to the lack of training data (photo-sketch pairs), none of such deep learning based methods can be applied successfully to face photos in the wild. In this paper, we propose a semi-supervised deep learning architecture which extends face sketch synthesis to handle face photos in the wild by exploiting additional face photos in training. Instead of supervising the network with ground truth sketches, we first perform patch matching in feature space between the input photo and photos in a small reference set of photo-sketch pairs. We then compose a pseudo sketch feature representation using the corresponding sketch feature patches to supervise our network. With the proposed approach, we can train our networks using a small reference set of photo-sketch pairs together with a large face photo dataset without ground truth sketches. Experiments show that our method achieve state-of-the-art performance both on public benchmarks and face photos in the wild. Codes are available at https://github.com/chaofengc/Face-Sketch-Wild.
Tasks Face Sketch Synthesis
Published 2018-12-12
URL https://arxiv.org/abs/1812.04929v2
PDF https://arxiv.org/pdf/1812.04929v2.pdf
PWC https://paperswithcode.com/paper/semi-supervised-learning-for-face-sketch
Repo https://github.com/chaofengc/Face-Sketch-Wild
Framework pytorch
comments powered by Disqus