October 21, 2019

2820 words 14 mins read

Paper Group AWR 166

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors. Spatially Transformed Adversarial Examples. Variational Inference for Policy Gradient. Residual Dense Network for Image Super-Resolution. Learning Joint Semantic Parsers from Disjoint Data. TDAN: Temporally Deformable Alignment Network for Video Super-Resolution. …

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors


Title	Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors
Authors	Yansen Wang, Ying Shen, Zhun Liu, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency
Abstract	Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.
Tasks	Emotion Recognition, Multimodal Sentiment Analysis, Sentiment Analysis
Published	2018-11-23
URL	http://arxiv.org/abs/1811.09362v2
PDF	http://arxiv.org/pdf/1811.09362v2.pdf
PWC	https://paperswithcode.com/paper/words-can-shift-dynamically-adjusting-word
Repo	https://github.com/maochf/CMU-MultimodalSDK
Framework	none

Spatially Transformed Adversarial Examples


Title	Spatially Transformed Adversarial Examples
Authors	Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, Dawn Song
Abstract	Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed to generate adversarial examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers have explored different defense methods to defend against such adversarial attacks. While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual quality remains an active research area, in this paper we will instead focus on a different type of perturbation, namely spatial transformation, as opposed to manipulating the pixel values directly as in prior works. Perturbations generated through spatial transformation could result in large $\mathcal{L}_p$ distance measures, but our extensive experiments show that such spatially transformed adversarial examples are perceptually realistic and more difficult to defend against with existing defense systems. This potentially provides a new direction in adversarial example generation and the design of corresponding defenses. We visualize the spatial transformation based perturbation for different examples and show that our technique can produce realistic adversarial examples with smooth image deformation. Finally, we visualize the attention of deep networks with different types of adversarial examples to better understand how these examples are interpreted.
Tasks
Published	2018-01-08
URL	http://arxiv.org/abs/1801.02612v2
PDF	http://arxiv.org/pdf/1801.02612v2.pdf
PWC	https://paperswithcode.com/paper/spatially-transformed-adversarial-examples
Repo	https://github.com/rakutentech/stAdv
Framework	tf

Variational Inference for Policy Gradient


Title	Variational Inference for Policy Gradient
Authors	Tianbing Xu
Abstract	Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems.
Tasks
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07833v2
PDF	http://arxiv.org/pdf/1802.07833v2.pdf
PWC	https://paperswithcode.com/paper/variational-inference-for-policy-gradient
Repo	https://github.com/tianbingsz/MLResearch
Framework	none

Residual Dense Network for Image Super-Resolution


Title	Residual Dense Network for Image Super-Resolution
Authors	Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu
Abstract	A very deep convolutional neural network (CNN) has recently achieved great success for image super-resolution (SR) and offered hierarchical features as well. However, most deep CNN based SR models do not make full use of the hierarchical features from the original low-resolution (LR) images, thereby achieving relatively-low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via dense connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory (CM) mechanism. Local feature fusion in RDB is then used to adaptively learn more effective features from preceding and current local features and stabilizes the training of wider network. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. Extensive experiments on benchmark datasets with different degradation models show that our RDN achieves favorable performance against state-of-the-art methods.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-02-24
URL	http://arxiv.org/abs/1802.08797v2
PDF	http://arxiv.org/pdf/1802.08797v2.pdf
PWC	https://paperswithcode.com/paper/residual-dense-network-for-image-super
Repo	https://github.com/puffnjackie/pytorch-super-resolution-implementations
Framework	pytorch

Learning Joint Semantic Parsers from Disjoint Data


Title	Learning Joint Semantic Parsers from Disjoint Data
Authors	Hao Peng, Sam Thomson, Swabha Swayamdipta, Noah A. Smith
Abstract	We present a new approach to learning semantic parsers from multiple datasets, even when the target semantic formalisms are drastically different, and the underlying corpora do not overlap. We handle such “disjoint” data by treating annotations for unobserved formalisms as latent structured variables. Building on state-of-the-art baselines, we show improvements both in frame-semantic parsing and semantic dependency parsing by modeling them jointly.
Tasks	Dependency Parsing, Semantic Dependency Parsing, Semantic Parsing
Published	2018-04-17
URL	http://arxiv.org/abs/1804.05990v1
PDF	http://arxiv.org/pdf/1804.05990v1.pdf
PWC	https://paperswithcode.com/paper/learning-joint-semantic-parsers-from-disjoint
Repo	https://github.com/swabhs/coling18tutorial
Framework	none

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution


Title	TDAN: Temporally Deformable Alignment Network for Video Super-Resolution
Authors	Yapeng Tian, Yulun Zhang, Yun Fu, Chenliang Xu
Abstract	Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). Due to varying motion of cameras or objects, the reference frame and each support frame are not aligned. Therefore, temporal alignment is a challenging yet important problem for VSR. Previous VSR methods usually utilize optical flow between the reference frame and each supporting frame to wrap the supporting frame for temporal alignment. Therefore, the performance of these image-level wrapping-based models will highly depend on the prediction accuracy of optical flow, and inaccurate optical flow will lead to artifacts in the wrapped supporting frames, which also will be propagated into the reconstructed HR video frame. To overcome the limitation, in this paper, we propose a temporal deformable alignment network (TDAN) to adaptively align the reference frame and each supporting frame at the feature level without computing optical flow. The TDAN uses features from both the reference frame and each supporting frame to dynamically predict offsets of sampling convolution kernels. By using the corresponding kernels, TDAN transforms supporting frames to align with the reference frame. To predict the HR video frame, a reconstruction network taking aligned frames and the reference frame is utilized. Experimental results demonstrate the effectiveness of the proposed TDAN-based VSR model.
Tasks	Optical Flow Estimation, Super-Resolution, Video Super-Resolution
Published	2018-12-07
URL	http://arxiv.org/abs/1812.02898v1
PDF	http://arxiv.org/pdf/1812.02898v1.pdf
PWC	https://paperswithcode.com/paper/tdan-temporally-deformable-alignment-network
Repo	https://github.com/YapengTian/TDAN-VSR
Framework	pytorch

Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers


Title	Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers
Authors	Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaela Bernardi, Jakub Szymanik
Abstract	We study the role of linguistic context in predicting quantifiers (`few',` all’). We collect crowdsourced data from human participants and test various models in a local (single-sentence) and a global context (multi-sentence) condition. Models significantly out-perform humans in the former setting and are only slightly better in the latter. While human performance improves with more linguistic context (especially on proportional quantifiers), model performance suffers. Models are very effective in exploiting lexical and morpho-syntactic patterns; humans are better at genuinely understanding the meaning of the (global) context.
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00354v1
PDF	http://arxiv.org/pdf/1806.00354v1.pdf
PWC	https://paperswithcode.com/paper/some-of-them-can-be-guessed-exploring-the
Repo	https://github.com/sandropezzelle/fill-in-the-quant
Framework	none

Bayesian Optimization of Combinatorial Structures


Title	Bayesian Optimization of Combinatorial Structures
Authors	Ricardo Baptista, Matthias Poloczek
Abstract	The optimization of expensive-to-evaluate black-box functions over combinatorial structures is an ubiquitous task in machine learning, engineering and the natural sciences. The combinatorial explosion of the search space and costly evaluations pose challenges for current techniques in discrete optimization and machine learning, and critically require new algorithmic ideas. This article proposes, to the best of our knowledge, the first algorithm to overcome these challenges, based on an adaptive, scalable model that identifies useful combinatorial structure even when data is scarce. Our acquisition function pioneers the use of semidefinite programming to achieve efficiency and scalability. Experimental evaluations demonstrate that this algorithm consistently outperforms other methods from combinatorial and Bayesian optimization.
Tasks
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08838v2
PDF	http://arxiv.org/pdf/1806.08838v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-of-combinatorial
Repo	https://github.com/baptistar/BOCS
Framework	none

Modeling Brain Networks with Artificial Neural Networks


Title	Modeling Brain Networks with Artificial Neural Networks
Authors	Baran Baris Kivilcim, Itir Onal Ertugrul, Fatos T. Yarman Vural
Abstract	In this study, we propose a neural network approach to capture the functional connectivities among anatomic brain regions. The suggested approach estimates a set of brain networks, each of which represents the connectivity patterns of a cognitive process. We employ two different architectures of neural networks to extract directed and undirected brain networks from functional Magnetic Resonance Imaging (fMRI) data. Then, we use the edge weights of the estimated brain networks to train a classifier, namely, Support Vector Machines(SVM) to label the underlying cognitive process. We compare our brain network models with popular models, which generate similar functional brain networks. We observe that both undirected and directed brain networks surpass the performances of the network models used in the fMRI literature. We also observe that directed brain networks offer more discriminative features compared to the undirected ones for recognizing the cognitive processes. The representation power of the suggested brain networks are tested in a task-fMRI dataset of Human Connectome Project and a Complex Problem Solving dataset.
Tasks
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08368v1
PDF	http://arxiv.org/pdf/1807.08368v1.pdf
PWC	https://paperswithcode.com/paper/modeling-brain-networks-with-artificial
Repo	https://github.com/baranbaris/modeling_brain_networks
Framework	none

Generating Paths with WFC


Title	Generating Paths with WFC
Authors	Hugo Scurti, Clark Verbrugge
Abstract	Motion plans are often randomly generated for minor game NPCs. Repetitive or regular movements, however, require non-trivial programming effort and/or integration with a pathing system. We here describe an example-based approach to path generation that requires little or no additional programming effort. Our work modifies the Wave Function Collapse (WFC) algorithm, adapting it to produce pathing plans similar to an input sketch. We show how simple sketch modifications control path characteristics, and demonstrate feasibility through a usable Unity implementation.
Tasks
Published	2018-08-13
URL	http://arxiv.org/abs/1808.04317v1
PDF	http://arxiv.org/pdf/1808.04317v1.pdf
PWC	https://paperswithcode.com/paper/generating-paths-with-wfc
Repo	https://github.com/hugoscurti/path-wfc
Framework	none

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks


Title	PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks
Authors	Marc Assens, Xavier Giro-i-Nieto, Kevin McGuinness, Noel E. O’Connor
Abstract	We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets. Source code and models are available at https://imatge-upc.github.io/pathgan/
Tasks
Published	2018-09-03
URL	http://arxiv.org/abs/1809.00567v1
PDF	http://arxiv.org/pdf/1809.00567v1.pdf
PWC	https://paperswithcode.com/paper/pathgan-visual-scanpath-prediction-with
Repo	https://github.com/imatge-upc/pathgan
Framework	tf

Explainable Neural Computation via Stack Neural Module Networks


Title	Explainable Neural Computation via Stack Neural Module Networks
Authors	Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko
Abstract	In complex inferential tasks like question answering, machine learning models must confront two challenges: the need to implement a compositional reasoning process, and, in many applications, the need for this reasoning process to be interpretable to assist users in both development and prediction. Existing models designed to produce interpretable traces of their decision-making process typically require these traces to be supervised at training time. In this paper, we present a novel neural modular approach that performs compositional reasoning by automatically inducing a desired sub-task decomposition without relying on strong supervision. Our model allows linking different reasoning tasks though shared modules that handle common routines across tasks. Experiments show that the model is more interpretable to human evaluators compared to other state-of-the-art models: users can better understand the model’s underlying reasoning procedure and predict when it will succeed or fail based on observing its intermediate outputs.
Tasks	Decision Making, Question Answering
Published	2018-07-23
URL	http://arxiv.org/abs/1807.08556v3
PDF	http://arxiv.org/pdf/1807.08556v3.pdf
PWC	https://paperswithcode.com/paper/explainable-neural-computation-via-stack
Repo	https://github.com/ronghanghu/snmn
Framework	tf

Residual Policy Learning


Title	Residual Policy Learning
Authors	Tom Silver, Kelsey Allen, Josh Tenenbaum, Leslie Kaelbling
Abstract	We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvements. We study RPL in six challenging MuJoCo tasks involving partial observability, sensor noise, model misspecification, and controller miscalibration. For initial controllers, we consider both hand-designed policies and model-predictive controllers with known or learned transition models. By combining learning with control algorithms, RPL can perform long-horizon, sparse-reward tasks for which reinforcement learning alone fails. Moreover, we find that RPL consistently and substantially improves on the initial controllers. We argue that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently. Video and code at https://k-r-allen.github.io/residual-policy-learning/.
Tasks
Published	2018-12-15
URL	http://arxiv.org/abs/1812.06298v2
PDF	http://arxiv.org/pdf/1812.06298v2.pdf
PWC	https://paperswithcode.com/paper/residual-policy-learning
Repo	https://github.com/k-r-allen/residual-policy-learning
Framework	tf

Joint Autoregressive and Hierarchical Priors for Learned Image Compression


Title	Joint Autoregressive and Hierarchical Priors for Learned Image Compression
Authors	David Minnen, Johannes Ballé, George Toderici
Abstract	Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate–distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.
Tasks	Image Compression
Published	2018-09-08
URL	http://arxiv.org/abs/1809.02736v1
PDF	http://arxiv.org/pdf/1809.02736v1.pdf
PWC	https://paperswithcode.com/paper/joint-autoregressive-and-hierarchical-priors
Repo	https://github.com/mengab/Joint-Autoregressive-and-Hierarchical-Priors-for-Learned-Image-Compression
Framework	tf

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition


Title	Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition
Authors	Zheng Lian, Ya Li, Jianhua Tao, Jian Huang
Abstract	Automatic emotion recognition is a challenging task. In this paper, we present our effort for the audio-video based sub-challenge of the Emotion Recognition in the Wild (EmotiW) 2018 challenge, which requires participants to assign a single emotion label to the video clip from the six universal emotions (Anger, Disgust, Fear, Happiness, Sad and Surprise) and Neutral. The proposed multimodal emotion recognition system takes audio, video and text information into account. Except for handcraft features, we also extract bottleneck features from deep neutral networks (DNNs) via transfer learning. Both temporal classifiers and non-temporal classifiers are evaluated to obtain the best unimodal emotion classification result. Then possibilities are extracted and passed into the Beam Search Fusion (BS-Fusion). We test our method in the EmotiW 2018 challenge and we gain promising results. Compared with the baseline system, there is a significant improvement. We achieve 60.34% accuracy on the testing dataset, which is only 1.5% lower than the winner. It shows that our method is very competitive.
Tasks	Emotion Classification, Emotion Recognition, Multimodal Emotion Recognition, Transfer Learning
Published	2018-09-13
URL	http://arxiv.org/abs/1809.06225v1
PDF	http://arxiv.org/pdf/1809.06225v1.pdf
PWC	https://paperswithcode.com/paper/investigation-of-multimodal-features
Repo	https://github.com/zeroQiaoba/EmotiW2018
Framework	tf