Paper Group AWR 294
Reversible Recurrent Neural Networks. Fast Geometrically-Perturbed Adversarial Faces. Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks. GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features. CASCADE: Contextual Sarcasm Detection in Online Discussion Forums. DenseImage Network: Video Spatial …
Reversible Recurrent Neural Networks
Title | Reversible Recurrent Neural Networks |
Authors | Matthew MacKay, Paul Vicol, Jimmy Ba, Roger Grosse |
Abstract | Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs—RNNs for which the hidden-to-hidden transition can be reversed—offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10–15. We extend our technique to attention-based sequence-to-sequence models, where it maintains performance while reducing activation memory cost by a factor of 5–10 in the encoder, and a factor of 10–15 in the decoder. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10999v1 |
http://arxiv.org/pdf/1810.10999v1.pdf | |
PWC | https://paperswithcode.com/paper/reversible-recurrent-neural-networks |
Repo | https://github.com/matthewjmackay/reversible-rnn |
Framework | pytorch |
Fast Geometrically-Perturbed Adversarial Faces
Title | Fast Geometrically-Perturbed Adversarial Faces |
Authors | Ali Dabouei, Sobhan Soleymani, Jeremy Dawson, Nasser M. Nasrabadi |
Abstract | The state-of-the-art performance of deep learning algorithms has led to a considerable increase in the utilization of machine learning in security-sensitive and critical applications. However, it has recently been shown that a small and carefully crafted perturbation in the input space can completely fool a deep model. In this study, we explore the extent to which face recognition systems are vulnerable to geometrically-perturbed adversarial faces. We propose a fast landmark manipulation method for generating adversarial faces, which is approximately 200 times faster than the previous geometric attacks and obtains 99.86% success rate on the state-of-the-art face recognition models. To further force the generated samples to be natural, we introduce a second attack constrained on the semantic structure of the face which has the half speed of the first attack with the success rate of 99.96%. Both attacks are extremely robust against the state-of-the-art defense methods with the success rate of equal or greater than 53.59%. Code is available at https://github.com/alldbi/FLM |
Tasks | Face Recognition |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.08999v2 |
http://arxiv.org/pdf/1809.08999v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-geometrically-perturbed-adversarial |
Repo | https://github.com/alldbi/FLM |
Framework | tf |
Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks
Title | Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks |
Authors | Hyeonseob Nam, Hyo-Eun Kim |
Abstract | Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc. Although these variations have been deemed to be implicitly handled by more training data and deeper networks, recent advances in image style transfer suggest that it is also possible to explicitly manipulate the style information. Extending this idea to general visual recognition problems, we present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. Considering certain style features play an essential role in discriminative tasks, BIN learns to selectively normalize only disturbing styles while preserving useful styles. The proposed normalization module is easily incorporated into existing network architectures such as Residual Networks, and surprisingly improves the recognition performance in various scenarios. Furthermore, experiments verify that BIN effectively adapts to completely different tasks like object classification and style transfer, by controlling the trade-off between preserving and removing style variations. BIN can be implemented with only a few lines of code using popular deep learning frameworks. |
Tasks | Object Classification, Style Transfer |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07925v3 |
http://arxiv.org/pdf/1805.07925v3.pdf | |
PWC | https://paperswithcode.com/paper/batch-instance-normalization-for-adaptively |
Repo | https://github.com/hyeonseob-nam/Batch-Instance-Normalization |
Framework | pytorch |
GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features
Title | GLStyleNet: Higher Quality Style Transfer Combining Global and Local Pyramid Features |
Authors | Zhizhong Wang, Lei Zhao, Wei Xing, Dongming Lu |
Abstract | Recent studies using deep neural networks have shown remarkable success in style transfer especially for artistic and photo-realistic images. However, the approaches using global feature correlations fail to capture small, intricate textures and maintain correct texture scales of the artworks, and the approaches based on local patches are defective on global effect. In this paper, we present a novel feature pyramid fusion neural network, dubbed GLStyleNet, which sufficiently takes into consideration multi-scale and multi-level pyramid features by best aggregating layers across a VGG network, and performs style transfer hierarchically with multiple losses of different scales. Our proposed method retains high-frequency pixel information and low frequency construct information of images from two aspects: loss function constraint and feature fusion. Our approach is not only flexible to adjust the trade-off between content and style, but also controllable between global and local. Compared to state-of-the-art methods, our method can transfer not just large-scale, obvious style cues but also subtle, exquisite ones, and dramatically improves the quality of style transfer. We demonstrate the effectiveness of our approach on portrait style transfer, artistic style transfer, photo-realistic style transfer and Chinese ancient painting style transfer tasks. Experimental results indicate that our unified approach improves image style transfer quality over previous state-of-the-art methods, while also accelerating the whole process in a certain extent. Our code is available at https://github.com/EndyWon/GLStyleNet. |
Tasks | Style Transfer |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07260v1 |
http://arxiv.org/pdf/1811.07260v1.pdf | |
PWC | https://paperswithcode.com/paper/glstylenet-higher-quality-style-transfer |
Repo | https://github.com/EndyWon/GLStyleNet |
Framework | tf |
CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
Title | CASCADE: Contextual Sarcasm Detection in Online Discussion Forums |
Authors | Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, Roger Zimmermann, Rada Mihalcea |
Abstract | The literature in automated sarcasm detection has mainly focused on lexical, syntactic and semantic-level analysis of text. However, a sarcastic sentence can be expressed with contextual presumptions, background and commonsense knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector) that adopts a hybrid approach of both content and context-driven modeling for sarcasm detection in online social media discussions. For the latter, CASCADE aims at extracting contextual information from the discourse of a discussion thread. Also, since the sarcastic nature and form of expression can vary from person to person, CASCADE utilizes user embeddings that encode stylometric and personality features of the users. When used along with content-based feature extractors such as Convolutional Neural Networks (CNNs), we see a significant boost in the classification performance on a large Reddit corpus. |
Tasks | Sarcasm Detection |
Published | 2018-05-16 |
URL | http://arxiv.org/abs/1805.06413v1 |
http://arxiv.org/pdf/1805.06413v1.pdf | |
PWC | https://paperswithcode.com/paper/cascade-contextual-sarcasm-detection-in |
Repo | https://github.com/SenticNet/CASCADE--ContextuAl-SarCAsm-DEtector |
Framework | tf |
DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding
Title | DenseImage Network: Video Spatial-Temporal Evolution Encoding and Understanding |
Authors | Xiaokai Chen, Ke Gao |
Abstract | Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive experiments on two recent challenging benchmarks demonstrate that our DenseImage Network can accurately capture the common spatial-temporal evolution between similar actions, even with enormous visual variations or different time scales. Moreover, we obtain the state-of-the-art results in action and gesture recognition with much less time-and-memory cost, indicating its immense potential in video representing and understanding. |
Tasks | Gesture Recognition, Video Understanding |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07550v1 |
http://arxiv.org/pdf/1805.07550v1.pdf | |
PWC | https://paperswithcode.com/paper/denseimage-network-video-spatial-temporal |
Repo | https://github.com/yliu1021/HandGestureClassifier |
Framework | tf |
Future semantic segmentation of time-lapsed videos with large temporal displacement
Title | Future semantic segmentation of time-lapsed videos with large temporal displacement |
Authors | Talha Siddiqui, Samarth Bharadwaj |
Abstract | An important aspect of video understanding is the ability to predict the evolution of its content in the future. This paper presents a future frame semantic segmentation technique for predicting semantic masks of the current and future frames in a time-lapsed video. We specifically focus on time-lapsed videos with large temporal displacement to highlight the model’s ability to capture large motions in time. We first introduce a unique semantic segmentation prediction dataset with over 120,000 time-lapsed sky-video frames and all corresponding semantic masks captured over a span of five years in North America region. The dataset has immense practical value for cloud cover analysis, which are treated as non-rigid objects of interest. %Here the model provides both semantic segmentation of cloud region and solar irradiance emitted from a region from the sky-videos. Next, our proposed recurrent network architecture departs from existing trend of using temporal convolutional networks (TCN) (or feed-forward networks), by explicitly learning an internal representations for the evolution of video content with time. Experimental evaluation shows an improvement of mean IoU over TCNs in the segmentation task by 10.8% for 10 mins (21% over 60 mins) ahead of time predictions. Further, our model simultaneously measures both the current and future solar irradiance from the same video frames with a normalized-MAE of 10.5% over two years. These results indicate that recurrent memory networks with attention mechanism are able to capture complex advective and diffused flow characteristic of dense fluids even with sparse temporal sampling and are more suitable for future frame prediction tasks for longer duration videos. |
Tasks | Semantic Segmentation, Video Understanding |
Published | 2018-12-27 |
URL | http://arxiv.org/abs/1812.10786v1 |
http://arxiv.org/pdf/1812.10786v1.pdf | |
PWC | https://paperswithcode.com/paper/future-semantic-segmentation-of-time-lapsed |
Repo | https://github.com/samarth-b/skycamIrr |
Framework | none |
Learning-based Video Motion Magnification
Title | Learning-based Video Motion Magnification |
Authors | Tae-Hyun Oh, Ronnachai Jaroensri, Changil Kim, Mohamed Elgharib, Frédo Durand, William T. Freeman, Wojciech Matusik |
Abstract | Video motion magnification techniques allow us to see small motions previously invisible to the naked eyes, such as those of vibrating airplane wings, or swaying buildings under the influence of the wind. Because the motion is small, the magnification results are prone to noise or excessive blurring. The state of the art relies on hand-designed filters to extract representations that may not be optimal. In this paper, we seek to learn the filters directly from examples using deep convolutional neural networks. To make training tractable, we carefully design a synthetic dataset that captures small motion well, and use two-frame input for training. We show that the learned filters achieve high-quality results on real videos, with less ringing artifacts and better noise characteristics than previous methods. While our model is not trained with temporal filters, we found that the temporal filters can be used with our extracted representations up to a moderate magnification, enabling a frequency-based motion selection. Finally, we analyze the learned filters and show that they behave similarly to the derivative filters used in previous works. Our code, trained model, and datasets will be available online. |
Tasks | |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02684v3 |
http://arxiv.org/pdf/1804.02684v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-video-motion-magnification |
Repo | https://github.com/12dmodel/deep_motion_mag |
Framework | tf |
Focal Visual-Text Attention for Visual Question Answering
Title | Focal Visual-Text Attention for Visual Question Answering |
Authors | Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann |
Abstract | Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset. |
Tasks | Memex Question Answering, Question Answering, Visual Question Answering |
Published | 2018-06-05 |
URL | https://arxiv.org/abs/1806.01873v2 |
https://arxiv.org/pdf/1806.01873v2.pdf | |
PWC | https://paperswithcode.com/paper/focal-visual-text-attention-for-visual |
Repo | https://github.com/JunweiLiang/FVTA_memoryqa |
Framework | tf |
DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text
Title | DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text |
Authors | Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu Sun |
Abstract | Existing text generation methods tend to produce repeated and “boring” expressions. To tackle this problem, we propose a new text generation model, called Diversity-Promoting Generative Adversarial Network (DP-GAN). The proposed model assigns low reward for repeatedly generated text and high reward for “novel” and fluent text, encouraging the generator to produce diverse and informative text. Moreover, we propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem compared with existing classifier-based discriminators. The experimental results on review generation and dialogue generation tasks demonstrate that our model can generate substantially more diverse and informative text than existing baselines. The code is available at https://github.com/lancopku/DPGAN |
Tasks | Dialogue Generation, Language Modelling, Text Generation |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01345v3 |
http://arxiv.org/pdf/1802.01345v3.pdf | |
PWC | https://paperswithcode.com/paper/dp-gan-diversity-promoting-generative |
Repo | https://github.com/AIJoris/DPAC-DialogueGAN |
Framework | pytorch |
k-Space Deep Learning for Accelerated MRI
Title | k-Space Deep Learning for Accelerated MRI |
Authors | Yoseob Han, Leonard Sunwoo, Jong Chul Ye |
Abstract | The annihilating filter-based low-rank Hankel matrix approach (ALOHA) is one of the state-of-the-art compressed sensing approaches that directly interpolates the missing k-space data using low-rank Hankel matrix completion. The success of ALOHA is due to the concise signal representation in the k-space domain thanks to the duality between structured low-rankness in the k-space domain and the image domain sparsity. Inspired by the recent mathematical discovery that links convolutional neural networks to Hankel matrix decomposition using data-driven framelet basis, here we propose a fully data-driven deep learning algorithm for k-space interpolation. Our network can be also easily applied to non-Cartesian k-space trajectories by simply adding an additional regridding layer. Extensive numerical experiments show that the proposed deep learning method consistently outperforms the existing image-domain deep learning approaches. |
Tasks | Matrix Completion |
Published | 2018-05-10 |
URL | https://arxiv.org/abs/1805.03779v3 |
https://arxiv.org/pdf/1805.03779v3.pdf | |
PWC | https://paperswithcode.com/paper/k-space-deep-learning-for-accelerated-mri |
Repo | https://github.com/hanyoseob/k-space-deep-learning |
Framework | none |
Neural Network Encapsulation
Title | Neural Network Encapsulation |
Authors | Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang |
Abstract | A capsule is a collection of neurons which represents different variants of a pattern in the network. The routing scheme ensures only certain capsules which resemble lower counterparts in the higher layer should be activated. However, the computational complexity becomes a bottleneck for scaling up to larger networks, as lower capsules need to correspond to each and every higher capsule. To resolve this limitation, we approximate the routing process with two branches: a master branch which collects primary information from its direct contact in the lower layer and an aide branch that replenishes master based on pattern variants encoded in other lower capsules. Compared with previous iterative and unsupervised routing scheme, these two branches are communicated in a fast, supervised and one-time pass fashion. The complexity and runtime of the model are therefore decreased by a large margin. Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers. We devise a feedback agreement unit to send back higher capsules as feedback. It could be regarded as an additional regularization to the network. The feedback agreement is achieved by comparing the optimal transport divergence between two distributions (lower and higher capsules). Such an add-on witnesses a unanimous gain in both capsule and vanilla networks. Our proposed EncapNet performs favorably better against previous state-of-the-arts on CIFAR10/100, SVHN and a subset of ImageNet. |
Tasks | |
Published | 2018-08-11 |
URL | http://arxiv.org/abs/1808.03749v1 |
http://arxiv.org/pdf/1808.03749v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-encapsulation |
Repo | https://github.com/hli2020/nn_encapsulation |
Framework | pytorch |
Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation
Title | Learning to Exploit the Prior Network Knowledge for Weakly-Supervised Semantic Segmentation |
Authors | Carolina Redondo-Cabrera, Marcos Baptista-Ríos, Roberto J. López-Sastre |
Abstract | Training a Convolutional Neural Network (CNN) for semantic segmentation typically requires to collect a large amount of accurate pixel-level annotations, a hard and expensive task. In contrast, simple image tags are easier to gather. With this paper we introduce a novel weakly-supervised semantic segmentation model able to learn from image labels, and just image labels. Our model uses the prior knowledge of a network trained for image recognition, employing these image annotations as an attention mechanism to identify semantic regions in the images. We then present a methodology that builds accurate class-specific segmentation masks from these regions, where neither external objectness nor saliency algorithms are required. We describe how to incorporate this mask generation strategy into a fully end-to-end trainable process where the network jointly learns to classify and segment images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these generated class-specific masks in conjunction with our novel end-to-end learning process outperforms several recent weakly-supervised semantic segmentation methods that use image tags only, and even some models that leverage additional supervision or training data. |
Tasks | Semantic Segmentation, Weakly-Supervised Semantic Segmentation |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04882v2 |
http://arxiv.org/pdf/1804.04882v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-exploit-the-prior-network |
Repo | https://github.com/gramuah/weakly-supervised-segmentation |
Framework | none |
Color Constancy by Reweighting Image Feature Maps
Title | Color Constancy by Reweighting Image Feature Maps |
Authors | Jueqin Qiu, Haisong Xu, Zhengnan Ye |
Abstract | In this study, a novel illuminant color estimation framework is proposed for color constancy, which incorporates the high representational capacity of deep-learning-based models and the great interpretability of assumption-based models. The well-designed building block, feature map reweight unit (ReWU), helps to achieve comparative accuracy on benchmark datasets with respect to prior state-of-the-art models while requiring only 1%-5% model size and 8%-20% computational cost. In addition to local color estimation, a confidence estimation branch is also included such that the model is able to produce point estimate and its uncertainty estimate simultaneously, which provides useful clues for local estimates aggregation and multiple illumination estimation. The source code and the dataset are available at https://github.com/QiuJueqin/Reweight-CC. |
Tasks | Color Constancy |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09248v2 |
http://arxiv.org/pdf/1806.09248v2.pdf | |
PWC | https://paperswithcode.com/paper/color-constancy-by-reweighting-image-feature |
Repo | https://github.com/QiuJueqin/Reweight-CC |
Framework | none |
To learn image super-resolution, use a GAN to learn how to do image degradation first
Title | To learn image super-resolution, use a GAN to learn how to do image degradation first |
Authors | Adrian Bulat, Jing Yang, Georgios Tzimiropoulos |
Abstract | This paper is on image and face super-resolution. The vast majority of prior work for this problem focus on how to increase the resolution of low-resolution images which are artificially generated by simple bilinear down-sampling (or in a few cases by blurring followed by down-sampling).We show that such methods fail to produce good results when applied to real-world low-resolution, low quality images. To circumvent this problem, we propose a two-stage process which firstly trains a High-to-Low Generative Adversarial Network (GAN) to learn how to degrade and downsample high-resolution images requiring, during training, only unpaired high and low-resolution images. Once this is achieved, the output of this network is used to train a Low-to-High GAN for image super-resolution using this time paired low- and high-resolution images. Our main result is that this network can be now used to efectively increase the quality of real-world low-resolution images. We have applied the proposed pipeline for the problem of face super-resolution where we report large improvement over baselines and prior work although the proposed method is potentially applicable to other object categories. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11458v1 |
http://arxiv.org/pdf/1807.11458v1.pdf | |
PWC | https://paperswithcode.com/paper/to-learn-image-super-resolution-use-a-gan-to |
Repo | https://github.com/jingyang2017/Face-and-Image-super-resolution |
Framework | pytorch |