October 21, 2019

2820 words 14 mins read

Paper Group AWR 166

Paper Group AWR 166

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors. Spatially Transformed Adversarial Examples. Variational Inference for Policy Gradient. Residual Dense Network for Image Super-Resolution. Learning Joint Semantic Parsers from Disjoint Data. TDAN: Temporally Deformable Alignment Network for Video Super-Resolution. …

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors

Title Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors
Authors Yansen Wang, Ying Shen, Zhun Liu, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency
Abstract Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.
Tasks Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published 2018-11-23
URL http://arxiv.org/abs/1811.09362v2
PDF http://arxiv.org/pdf/1811.09362v2.pdf
PWC https://paperswithcode.com/paper/words-can-shift-dynamically-adjusting-word
Repo https://github.com/maochf/CMU-MultimodalSDK
Framework none

Spatially Transformed Adversarial Examples

Title Spatially Transformed Adversarial Examples
Authors Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, Dawn Song
Abstract Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large $\mathcal{L}_p$ distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.
Tasks
Published 2018-01-08
URL http://arxiv.org/abs/1801.02612v2
PDF http://arxiv.org/pdf/1801.02612v2.pdf
PWC https://paperswithcode.com/paper/spatially-transformed-adversarial-examples
Repo https://github.com/rakutentech/stAdv
Framework tf

Variational Inference for Policy Gradient

Title Variational Inference for Policy Gradient
Authors Tianbing Xu
Abstract Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems.
Tasks
Published 2018-02-21
URL http://arxiv.org/abs/1802.07833v2
PDF http://arxiv.org/pdf/1802.07833v2.pdf
PWC https://paperswithcode.com/paper/variational-inference-for-policy-gradient
Repo https://github.com/tianbingsz/MLResearch
Framework none

Residual Dense Network for Image Super-Resolution

Title Residual Dense Network for Image Super-Resolution
Authors Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu
Abstract A very deep convolutional neural network (CNN) has recently achieved great success for image super-resolution (SR) and offered hierarchical features as well. However, most deep CNN based SR models do not make full use of the hierarchical features from the original low-resolution (LR) images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via dense connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory (CM) mechanism. Local feature fusion in RDB is then used to adaptively learn more effective features from preceding and current local features and stabilizes the training of wider network. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. Extensive experiments on benchmark datasets with different degradation models show that our RDN achieves favorable performance against state-of-the-art methods.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-02-24
URL http://arxiv.org/abs/1802.08797v2
PDF http://arxiv.org/pdf/1802.08797v2.pdf
PWC https://paperswithcode.com/paper/residual-dense-network-for-image-super
Repo https://github.com/puffnjackie/pytorch-super-resolution-implementations
Framework pytorch

Learning Joint Semantic Parsers from Disjoint Data

Title Learning Joint Semantic Parsers from Disjoint Data
Authors Hao Peng, Sam Thomson, Swabha Swayamdipta, Noah A. Smith
Abstract We present a new approach to learning semantic parsers from multiple datasets, even when the target semantic formalisms are drastically different, and the underlying corpora do not overlap. We handle such “disjoint” data by treating annotations for unobserved formalisms as latent structured variables. Building on state-of-the-art baselines, we show improvements both in frame-semantic parsing and semantic dependency parsing by modeling them jointly.
Tasks Dependency Parsing, Semantic Dependency Parsing, Semantic Parsing
Published 2018-04-17
URL http://arxiv.org/abs/1804.05990v1
PDF http://arxiv.org/pdf/1804.05990v1.pdf
PWC https://paperswithcode.com/paper/learning-joint-semantic-parsers-from-disjoint
Repo https://github.com/swabhs/coling18tutorial
Framework none

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

Title TDAN: Temporally Deformable Alignment Network for Video Super-Resolution
Authors Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu
Abstract Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). Due to varying motion of cameras or objects, the reference frame and each support frame are not aligned. Therefore, temporal alignment is a challenging yet important problem for VSR. Previous VSR methods usually utilize optical flow between the reference frame and each supporting frame to wrap the supporting frame for temporal alignment. Therefore, the performance of these image-level wrapping-based models will highly depend on the prediction accuracy of optical flow, and inaccurate optical flow will lead to artifacts in the wrapped supporting frames, which also will be propagated into the reconstructed HR video frame. To overcome the limitation, in this paper, we propose a temporal deformable alignment network (TDAN) to adaptively align the reference frame and each supporting frame at the feature level without computing optical flow. The TDAN uses features from both the reference frame and each supporting frame to dynamically predict offsets of sampling convolution kernels. By using the corresponding kernels, TDAN transforms supporting frames to align with the reference frame. To predict the HR video frame, a reconstruction network taking aligned frames and the reference frame is utilized. Experimental results demonstrate the effectiveness of the proposed TDAN-based VSR model.
Tasks Optical Flow Estimation, Super-Resolution, Video Super-Resolution
Published 2018-12-07
URL http://arxiv.org/abs/1812.02898v1
PDF http://arxiv.org/pdf/1812.02898v1.pdf
PWC https://paperswithcode.com/paper/tdan-temporally-deformable-alignment-network
Repo https://github.com/YapengTian/TDAN-VSR
Framework pytorch

Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers

Title Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers
Authors Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaela Bernardi, Jakub Szymanik
Abstract We study the role of linguistic context in predicting quantifiers (few', all’). We collect crowdsourced data from human participants and test various models in a local (single-sentence) and a global context (multi-sentence) condition. Models significantly out-perform humans in the former setting and are only slightly better in the latter. While human performance improves with more linguistic context (especially on proportional quantifiers), model performance suffers. Models are very effective in exploiting lexical and morpho-syntactic patterns; humans are better at genuinely understanding the meaning of the (global) context.
Tasks
Published 2018-06-01
URL http://arxiv.org/abs/1806.00354v1
PDF http://arxiv.org/pdf/1806.00354v1.pdf
PWC https://paperswithcode.com/paper/some-of-them-can-be-guessed-exploring-the
Repo https://github.com/sandropezzelle/fill-in-the-quant
Framework none

Bayesian Optimization of Combinatorial Structures

Title Bayesian Optimization of Combinatorial Structures
Authors Ricardo Baptista, Matthias Poloczek
Abstract The optimization of expensive-to-evaluate black-box functions over combinatorial structures is an ubiquitous task in machine learning, engineering and the natural sciences. The combinatorial explosion of the search space and costly evaluations pose challenges for current techniques in discrete optimization and machine learning, and critically require new algorithmic ideas. This article proposes, to the best of our knowledge, the first algorithm to overcome these challenges, based on an adaptive, scalable model that identifies useful combinatorial structure even when data is scarce. Our acquisition function pioneers the use of semidefinite programming to achieve efficiency and scalability. Experimental evaluations demonstrate that this algorithm consistently outperforms other methods from combinatorial and Bayesian optimization.
Tasks
Published 2018-06-22
URL http://arxiv.org/abs/1806.08838v2
PDF http://arxiv.org/pdf/1806.08838v2.pdf
PWC https://paperswithcode.com/paper/bayesian-optimization-of-combinatorial
Repo https://github.com/baptistar/BOCS
Framework none

Modeling Brain Networks with Artificial Neural Networks

Title Modeling Brain Networks with Artificial Neural Networks
Authors Baran Baris Kivilcim, Itir Onal Ertugrul, Fatos T. Yarman Vural
Abstract In this study, we propose a neural network approach to capture the functional connectivities among anatomic brain regions. The suggested approach estimates a set of brain networks, each of which represents the connectivity patterns of a cognitive process. We employ two different architectures of neural networks to extract directed and undirected brain networks from functional Magnetic Resonance Imaging (fMRI) data. Then, we use the edge weights of the estimated brain networks to train a classifier, namely, Support Vector Machines(SVM) to label the underlying cognitive process. We compare our brain network models with popular models, which generate similar functional brain networks. We observe that both undirected and directed brain networks surpass the performances of the network models used in the fMRI literature. We also observe that directed brain networks offer more discriminative features compared to the undirected ones for recognizing the cognitive processes. The representation power of the suggested brain networks are tested in a task-fMRI dataset of Human Connectome Project and a Complex Problem Solving dataset.
Tasks
Published 2018-07-22
URL http://arxiv.org/abs/1807.08368v1
PDF http://arxiv.org/pdf/1807.08368v1.pdf
PWC https://paperswithcode.com/paper/modeling-brain-networks-with-artificial
Repo https://github.com/baranbaris/modeling_brain_networks
Framework none

Generating Paths with WFC

Title Generating Paths with WFC
Authors Hugo Scurti, Clark Verbrugge
Abstract Motion plans are often randomly generated for minor game NPCs. Repetitive or regular movements, however, require non-trivial programming effort and/or integration with a pathing system. We here describe an example-based approach to path generation that requires little or no additional programming effort. Our work modifies the Wave Function Collapse (WFC) algorithm, adapting it to produce pathing plans similar to an input sketch. We show how simple sketch modifications control path characteristics, and demonstrate feasibility through a usable Unity implementation.
Tasks
Published 2018-08-13
URL http://arxiv.org/abs/1808.04317v1
PDF http://arxiv.org/pdf/1808.04317v1.pdf
PWC https://paperswithcode.com/paper/generating-paths-with-wfc
Repo https://github.com/hugoscurti/path-wfc
Framework none

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

Title PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks
Authors Marc Assens, Xavier Giro-i-Nieto, Kevin McGuinness, Noel E. O’Connor
Abstract We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets. Source code and models are available at https://imatge-upc.github.io/pathgan/
Tasks
Published 2018-09-03
URL http://arxiv.org/abs/1809.00567v1
PDF http://arxiv.org/pdf/1809.00567v1.pdf
PWC https://paperswithcode.com/paper/pathgan-visual-scanpath-prediction-with
Repo https://github.com/imatge-upc/pathgan
Framework tf

Explainable Neural Computation via Stack Neural Module Networks

Title Explainable Neural Computation via Stack Neural Module Networks
Authors Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko
Abstract In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction. Existing models designed to produce interpretable traces of their decision-making process typically require these traces to be supervised at training time. In this paper, we present a novel neural modular approach that performs compositional reasoning by automatically inducing a desired sub-task decomposition without relying on strong supervision. Our model allows linking different reasoning tasks though shared modules that handle common routines across tasks. Experiments show that the model is more interpretable to human evaluators compared to other state-of-the-art models: users can better understand the model’s underlying reasoning procedure and predict when it will succeed or fail based on observing its intermediate outputs.
Tasks Decision Making, Question Answering
Published 2018-07-23
URL http://arxiv.org/abs/1807.08556v3
PDF http://arxiv.org/pdf/1807.08556v3.pdf
PWC https://paperswithcode.com/paper/explainable-neural-computation-via-stack
Repo https://github.com/ronghanghu/snmn
Framework tf

Residual Policy Learning

Title Residual Policy Learning
Authors Tom Silver, Kelsey Allen, Josh Tenenbaum, Leslie Kaelbling
Abstract We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvements. We study RPL in six challenging MuJoCo tasks involving partial observability, sensor noise, model misspecification, and controller miscalibration. For initial controllers, we consider both hand-designed policies and model-predictive controllers with known or learned transition models. By combining learning with control algorithms, RPL can perform long-horizon, sparse-reward tasks for which reinforcement learning alone fails. Moreover, we find that RPL consistently and substantially improves on the initial controllers. We argue that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently. Video and code at https://k-r-allen.github.io/residual-policy-learning/.
Tasks
Published 2018-12-15
URL http://arxiv.org/abs/1812.06298v2
PDF http://arxiv.org/pdf/1812.06298v2.pdf
PWC https://paperswithcode.com/paper/residual-policy-learning
Repo https://github.com/k-r-allen/residual-policy-learning
Framework tf

Joint Autoregressive and Hierarchical Priors for Learned Image Compression

Title Joint Autoregressive and Hierarchical Priors for Learned Image Compression
Authors David Minnen, Johannes Ballé, George Toderici
Abstract Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate–distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.
Tasks Image Compression
Published 2018-09-08
URL http://arxiv.org/abs/1809.02736v1
PDF http://arxiv.org/pdf/1809.02736v1.pdf
PWC https://paperswithcode.com/paper/joint-autoregressive-and-hierarchical-priors
Repo https://github.com/mengab/Joint-Autoregressive-and-Hierarchical-Priors-for-Learned-Image-Compression
Framework tf

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition

Title Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition
Authors Zheng Lian, Ya Li, Jianhua Tao, Jian Huang
Abstract Automatic emotion recognition is a challenging task. In this paper, we present our effort for the audio-video based sub-challenge of the Emotion Recognition in the Wild (EmotiW) 2018 challenge, which requires participants to assign a single emotion label to the video clip from the six universal emotions (Anger, Disgust, Fear, Happiness, Sad and Surprise) and Neutral. The proposed multimodal emotion recognition system takes audio, video and text information into account. Except for handcraft features, we also extract bottleneck features from deep neutral networks (DNNs) via transfer learning. Both temporal classifiers and non-temporal classifiers are evaluated to obtain the best unimodal emotion classification result. Then possibilities are extracted and passed into the Beam Search Fusion (BS-Fusion). We test our method in the EmotiW 2018 challenge and we gain promising results. Compared with the baseline system, there is a significant improvement. We achieve 60.34% accuracy on the testing dataset, which is only 1.5% lower than the winner. It shows that our method is very competitive.
Tasks Emotion Classification, Emotion Recognition, Multimodal Emotion Recognition, Transfer Learning
Published 2018-09-13
URL http://arxiv.org/abs/1809.06225v1
PDF http://arxiv.org/pdf/1809.06225v1.pdf
PWC https://paperswithcode.com/paper/investigation-of-multimodal-features
Repo https://github.com/zeroQiaoba/EmotiW2018
Framework tf
comments powered by Disqus