October 20, 2019

3249 words 16 mins read

Paper Group AWR 294

Reversible Recurrent Neural Networks. Fast Geometrically-Perturbed Adversarial Faces. Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks. GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features. CASCADE: Contextual Sarcasm Detection in Online Discussion Forums. DenseImage Network: Video Spatial …

Reversible Recurrent Neural Networks


Title	Reversible Recurrent Neural Networks
Authors	Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse
Abstract	Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs—RNNs for which the hidden-to-hidden transition can be reversed—offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10–15. We extend our technique to attention-based sequence-to-sequence models, where it maintains performance while reducing activation memory cost by a factor of 5–10 in the encoder, and a factor of 10–15 in the decoder.
Tasks
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10999v1
PDF	http://arxiv.org/pdf/1810.10999v1.pdf
PWC	https://paperswithcode.com/paper/reversible-recurrent-neural-networks
Repo	https://github.com/matthewjmackay/reversible-rnn
Framework	pytorch

Fast Geometrically-Perturbed Adversarial Faces


Title	Fast Geometrically-Perturbed Adversarial Faces
Authors	Ali Dabouei, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi
Abstract	The state-of-the-art performance of deep learning algorithms has led to a considerable increase in the utilization of machine learning in security-sensitive and critical applications. However, it has recently been shown that a small and carefully crafted perturbation in the input space can completely fool a deep model. In this study, we explore the extent to which face recognition systems are vulnerable to geometrically-perturbed adversarial faces. We propose a fast landmark manipulation method for generating adversarial faces, which is approximately 200 times faster than the previous geometric attacks and obtains 99.86% success rate on the state-of-the-art face recognition models. To further force the generated samples to be natural, we introduce a second attack constrained on the semantic structure of the face which has the half speed of the first attack with the success rate of 99.96%. Both attacks are extremely robust against the state-of-the-art defense methods with the success rate of equal or greater than 53.59%. Code is available at https://github.com/alldbi/FLM
Tasks	Face Recognition
Published	2018-09-24
URL	http://arxiv.org/abs/1809.08999v2
PDF	http://arxiv.org/pdf/1809.08999v2.pdf
PWC	https://paperswithcode.com/paper/fast-geometrically-perturbed-adversarial
Repo	https://github.com/alldbi/FLM
Framework	tf

Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks


Title	Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks
Authors	Hyeonseob Nam, Hyo-Eun Kim
Abstract	Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc. Although these variations have been deemed to be implicitly handled by more training data and deeper networks, recent advances in image style transfer suggest that it is also possible to explicitly manipulate the style information. Extending this idea to general visual recognition problems, we present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. Considering certain style features play an essential role in discriminative tasks, BIN learns to selectively normalize only disturbing styles while preserving useful styles. The proposed normalization module is easily incorporated into existing network architectures such as Residual Networks, and surprisingly improves the recognition performance in various scenarios. Furthermore, experiments verify that BIN effectively adapts to completely different tasks like object classification and style transfer, by controlling the trade-off between preserving and removing style variations. BIN can be implemented with only a few lines of code using popular deep learning frameworks.
Tasks	Object Classification, Style Transfer
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07925v3
PDF	http://arxiv.org/pdf/1805.07925v3.pdf
PWC	https://paperswithcode.com/paper/batch-instance-normalization-for-adaptively
Repo	https://github.com/hyeonseob-nam/Batch-Instance-Normalization
Framework	pytorch

GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features


Title	GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features
Authors	Zhizhong Wang, Lei Zhao, Wei Xing, Dongming Lu
Abstract	Recent studies using deep neural networks have shown remarkable success in style transfer especially for artistic and photo-realistic images. However, the approaches using global feature correlations fail to capture small, intricate textures and maintain correct texture scales of the artworks, and the approaches based on local patches are defective on global effect. In this paper, we present a novel feature pyramid fusion neural network, dubbed GLStyleNet, which sufficiently takes into consideration multi-scale and multi-level pyramid features by best aggregating layers across a VGG network, and performs style transfer hierarchically with multiple losses of different scales. Our proposed method retains high-frequency pixel information and low frequency construct information of images from two aspects: loss function constraint and feature fusion. Our approach is not only flexible to adjust the trade-off between content and style, but also controllable between global and local. Compared to state-of-the-art methods, our method can transfer not just large-scale, obvious style cues but also subtle, exquisite ones, and dramatically improves the quality of style transfer. We demonstrate the effectiveness of our approach on portrait style transfer, artistic style transfer, photo-realistic style transfer and Chinese ancient painting style transfer tasks. Experimental results indicate that our unified approach improves image style transfer quality over previous state-of-the-art methods, while also accelerating the whole process in a certain extent. Our code is available at https://github.com/EndyWon/GLStyleNet.
Tasks	Style Transfer
Published	2018-11-18
URL	http://arxiv.org/abs/1811.07260v1
PDF	http://arxiv.org/pdf/1811.07260v1.pdf
PWC	https://paperswithcode.com/paper/glstylenet-higher-quality-style-transfer
Repo	https://github.com/EndyWon/GLStyleNet
Framework	tf

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums


Title	CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
Authors	Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann, Rada Mihalcea
Abstract	The literature in automated sarcasm detection has mainly focused on lexical, syntactic and semantic-level analysis of text. However, a sarcastic sentence can be expressed with contextual presumptions, background and commonsense knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector) that adopts a hybrid approach of both content and context-driven modeling for sarcasm detection in online social media discussions. For the latter, CASCADE aims at extracting contextual information from the discourse of a discussion thread. Also, since the sarcastic nature and form of expression can vary from person to person, CASCADE utilizes user embeddings that encode stylometric and personality features of the users. When used along with content-based feature extractors such as Convolutional Neural Networks (CNNs), we see a significant boost in the classification performance on a large Reddit corpus.
Tasks	Sarcasm Detection
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06413v1
PDF	http://arxiv.org/pdf/1805.06413v1.pdf
PWC	https://paperswithcode.com/paper/cascade-contextual-sarcasm-detection-in
Repo	https://github.com/SenticNet/CASCADE--ContextuAl-SarCAsm-DEtector
Framework	tf

DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding


Title	DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding
Authors	Xiaokai Chen, Ke Gao
Abstract	Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive experiments on two recent challenging benchmarks demonstrate that our DenseImage Network can accurately capture the common spatial-temporal evolution between similar actions, even with enormous visual variations or different time scales. Moreover, we obtain the state-of-the-art results in action and gesture recognition with much less time-and-memory cost, indicating its immense potential in video representing and understanding.
Tasks	Gesture Recognition, Video Understanding
Published	2018-05-19
URL	http://arxiv.org/abs/1805.07550v1
PDF	http://arxiv.org/pdf/1805.07550v1.pdf
PWC	https://paperswithcode.com/paper/denseimage-network-video-spatial-temporal
Repo	https://github.com/yliu1021/HandGestureClassifier
Framework	tf

Future semantic segmentation of time-lapsed videos with large temporal displacement


Title	Future semantic segmentation of time-lapsed videos with large temporal displacement
Authors	Talha Siddiqui, Samarth Bharadwaj
Abstract	An important aspect of video understanding is the ability to predict the evolution of its content in the future. This paper presents a future frame semantic segmentation technique for predicting semantic masks of the current and future frames in a time-lapsed video. We specifically focus on time-lapsed videos with large temporal displacement to highlight the model’s ability to capture large motions in time. We first introduce a unique semantic segmentation prediction dataset with over 120,000 time-lapsed sky-video frames and all corresponding semantic masks captured over a span of five years in North America region. The dataset has immense practical value for cloud cover analysis, which are treated as non-rigid objects of interest. %Here the model provides both semantic segmentation of cloud region and solar irradiance emitted from a region from the sky-videos. Next, our proposed recurrent network architecture departs from existing trend of using temporal convolutional networks (TCN) (or feed-forward networks), by explicitly learning an internal representations for the evolution of video content with time. Experimental evaluation shows an improvement of mean IoU over TCNs in the segmentation task by 10.8% for 10 mins (21% over 60 mins) ahead of time predictions. Further, our model simultaneously measures both the current and future solar irradiance from the same video frames with a normalized-MAE of 10.5% over two years. These results indicate that recurrent memory networks with attention mechanism are able to capture complex advective and diffused flow characteristic of dense fluids even with sparse temporal sampling and are more suitable for future frame prediction tasks for longer duration videos.
Tasks	Semantic Segmentation, Video Understanding
Published	2018-12-27
URL	http://arxiv.org/abs/1812.10786v1
PDF	http://arxiv.org/pdf/1812.10786v1.pdf
PWC	https://paperswithcode.com/paper/future-semantic-segmentation-of-time-lapsed
Repo	https://github.com/samarth-b/skycamIrr
Framework	none

Learning-based Video Motion Magnification


Title	Learning-based Video Motion Magnification
Authors	Tae-Hyun Oh, Ronnachai Jaroensri, Changil Kim, Mohamed Elgharib, Frédo Durand, William T. Freeman, Wojciech Matusik
Abstract	Video motion magnification techniques allow us to see small motions previously invisible to the naked eyes, such as those of vibrating airplane wings, or swaying buildings under the influence of the wind. Because the motion is small, the magnification results are prone to noise or excessive blurring. The state of the art relies on hand-designed filters to extract representations that may not be optimal. In this paper, we seek to learn the filters directly from examples using deep convolutional neural networks. To make training tractable, we carefully design a synthetic dataset that captures small motion well, and use two-frame input for training. We show that the learned filters achieve high-quality results on real videos, with less ringing artifacts and better noise characteristics than previous methods. While our model is not trained with temporal filters, we found that the temporal filters can be used with our extracted representations up to a moderate magnification, enabling a frequency-based motion selection. Finally, we analyze the learned filters and show that they behave similarly to the derivative filters used in previous works. Our code, trained model, and datasets will be available online.
Tasks
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02684v3
PDF	http://arxiv.org/pdf/1804.02684v3.pdf
PWC	https://paperswithcode.com/paper/learning-based-video-motion-magnification
Repo	https://github.com/12dmodel/deep_motion_mag
Framework	tf

Focal Visual-Text Attention for Visual Question Answering


Title	Focal Visual-Text Attention for Visual Question Answering
Authors	Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann
Abstract	Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.
Tasks	Memex Question Answering, Question Answering, Visual Question Answering
Published	2018-06-05
URL	https://arxiv.org/abs/1806.01873v2
PDF	https://arxiv.org/pdf/1806.01873v2.pdf
PWC	https://paperswithcode.com/paper/focal-visual-text-attention-for-visual
Repo	https://github.com/JunweiLiang/FVTA_memoryqa
Framework	tf

DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text


Title	DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text
Authors	Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu Sun
Abstract	Existing text generation methods tend to produce repeated and “boring” expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeatedly generated text and high reward for “novel” and fluent text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our model can generate substantially more diverse and informative text than existing baselines. The code is available at https://github.com/lancopku/DPGAN
Tasks	Dialogue Generation, Language Modelling, Text Generation
Published	2018-02-05
URL	http://arxiv.org/abs/1802.01345v3
PDF	http://arxiv.org/pdf/1802.01345v3.pdf
PWC	https://paperswithcode.com/paper/dp-gan-diversity-promoting-generative
Repo	https://github.com/AIJoris/DPAC-DialogueGAN
Framework	pytorch

k-Space Deep Learning for Accelerated MRI


Title	k-Space Deep Learning for Accelerated MRI
Authors	Yoseob Han, Leonard Sunwoo, Jong Chul Ye
Abstract	The annihilating filter-based low-rank Hankel matrix approach (ALOHA) is one of the state-of-the-art compressed sensing approaches that directly interpolates the missing k-space data using low-rank Hankel matrix completion. The success of ALOHA is due to the concise signal representation in the k-space domain thanks to the duality between structured low-rankness in the k-space domain and the image domain sparsity. Inspired by the recent mathematical discovery that links convolutional neural networks to Hankel matrix decomposition using data-driven framelet basis, here we propose a fully data-driven deep learning algorithm for k-space interpolation. Our network can be also easily applied to non-Cartesian k-space trajectories by simply adding an additional regridding layer. Extensive numerical experiments show that the proposed deep learning method consistently outperforms the existing image-domain deep learning approaches.
Tasks	Matrix Completion
Published	2018-05-10
URL	https://arxiv.org/abs/1805.03779v3
PDF	https://arxiv.org/pdf/1805.03779v3.pdf
PWC	https://paperswithcode.com/paper/k-space-deep-learning-for-accelerated-mri
Repo	https://github.com/hanyoseob/k-space-deep-learning
Framework	none

Neural Network Encapsulation


Title	Neural Network Encapsulation
Authors	Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang
Abstract	A capsule is a collection of neurons which represents different variants of a pattern in the network. The routing scheme ensures only certain capsules which resemble lower counterparts in the higher layer should be activated. However, the computational complexity becomes a bottleneck for scaling up to larger networks, as lower capsules need to correspond to each and every higher capsule. To resolve this limitation, we approximate the routing process with two branches: a master branch which collects primary information from its direct contact in the lower layer and an aide branch that replenishes master based on pattern variants encoded in other lower capsules. Compared with previous iterative and unsupervised routing scheme, these two branches are communicated in a fast, supervised and one-time pass fashion. The complexity and runtime of the model are therefore decreased by a large margin. Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers. We devise a feedback agreement unit to send back higher capsules as feedback. It could be regarded as an additional regularization to the network. The feedback agreement is achieved by comparing the optimal transport divergence between two distributions (lower and higher capsules). Such an add-on witnesses a unanimous gain in both capsule and vanilla networks. Our proposed EncapNet performs favorably better against previous state-of-the-arts on CIFAR10/100, SVHN and a subset of ImageNet.
Tasks
Published	2018-08-11
URL	http://arxiv.org/abs/1808.03749v1
PDF	http://arxiv.org/pdf/1808.03749v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-encapsulation
Repo	https://github.com/hli2020/nn_encapsulation
Framework	pytorch

Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation


Title	Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation
Authors	Carolina Redondo-Cabrera, Marcos Baptista-Ríos, Roberto J. López-Sastre
Abstract	Training a Convolutional Neural Network (CNN) for semantic segmentation typically requires to collect a large amount of accurate pixel-level annotations, a hard and expensive task. In contrast, simple image tags are easier to gather. With this paper we introduce a novel weakly-supervised semantic segmentation model able to learn from image labels, and just image labels. Our model uses the prior knowledge of a network trained for image recognition, employing these image annotations as an attention mechanism to identify semantic regions in the images. We then present a methodology that builds accurate class-specific segmentation masks from these regions, where neither external objectness nor saliency algorithms are required. We describe how to incorporate this mask generation strategy into a fully end-to-end trainable process where the network jointly learns to classify and segment images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these generated class-specific masks in conjunction with our novel end-to-end learning process outperforms several recent weakly-supervised semantic segmentation methods that use image tags only, and even some models that leverage additional supervision or training data.
Tasks	Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2018-04-13
URL	http://arxiv.org/abs/1804.04882v2
PDF	http://arxiv.org/pdf/1804.04882v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-exploit-the-prior-network
Repo	https://github.com/gramuah/weakly-supervised-segmentation
Framework	none

Color Constancy by Reweighting Image Feature Maps


Title	Color Constancy by Reweighting Image Feature Maps
Authors	Jueqin Qiu, Haisong Xu, Zhengnan Ye
Abstract	In this study, a novel illuminant color estimation framework is proposed for color constancy, which incorporates the high representational capacity of deep-learning-based models and the great interpretability of assumption-based models. The well-designed building block, feature map reweight unit (ReWU), helps to achieve comparative accuracy on benchmark datasets with respect to prior state-of-the-art models while requiring only 1%-5% model size and 8%-20% computational cost. In addition to local color estimation, a confidence estimation branch is also included such that the model is able to produce point estimate and its uncertainty estimate simultaneously, which provides useful clues for local estimates aggregation and multiple illumination estimation. The source code and the dataset are available at https://github.com/QiuJueqin/Reweight-CC.
Tasks	Color Constancy
Published	2018-06-25
URL	http://arxiv.org/abs/1806.09248v2
PDF	http://arxiv.org/pdf/1806.09248v2.pdf
PWC	https://paperswithcode.com/paper/color-constancy-by-reweighting-image-feature
Repo	https://github.com/QiuJueqin/Reweight-CC
Framework	none

To learn image super-resolution, use a GAN to learn how to do image degradation first


Title	To learn image super-resolution, use a GAN to learn how to do image degradation first
Authors	Adrian Bulat, Jing Yang, Georgios Tzimiropoulos
Abstract	This paper is on image and face super-resolution. The vast majority of prior work for this problem focus on how to increase the resolution of low-resolution images which are artificially generated by simple bilinear down-sampling (or in a few cases by blurring followed by down-sampling).We show that such methods fail to produce good results when applied to real-world low-resolution, low quality images. To circumvent this problem, we propose a two-stage process which firstly trains a High-to-Low Generative Adversarial Network (GAN) to learn how to degrade and downsample high-resolution images requiring, during training, only unpaired high and low-resolution images. Once this is achieved, the output of this network is used to train a Low-to-High GAN for image super-resolution using this time paired low- and high-resolution images. Our main result is that this network can be now used to efectively increase the quality of real-world low-resolution images. We have applied the proposed pipeline for the problem of face super-resolution where we report large improvement over baselines and prior work although the proposed method is potentially applicable to other object categories.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11458v1
PDF	http://arxiv.org/pdf/1807.11458v1.pdf
PWC	https://paperswithcode.com/paper/to-learn-image-super-resolution-use-a-gan-to
Repo	https://github.com/jingyang2017/Face-and-Image-super-resolution
Framework	pytorch