February 1, 2020

3368 words 16 mins read

Paper Group AWR 104

Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer. Attention-aware Multi-stroke Style Transfer. Polyglot Contextual Representations Improve Crosslingual Transfer. Billion-scale semi-supervised learning for image classification. HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning. Recommend …

Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer


Title	Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer
Authors	Xinyuan Chen, Chang Xu, Xiaokang Yang, Li Song, Dacheng Tao
Abstract	Style transfer describes the rendering of an image semantic content as different artistic styles. Recently, generative adversarial networks (GANs) have emerged as an effective approach in style transfer by adversarially training the generator to synthesize convincing counterfeits. However, traditional GAN suffers from the mode collapse issue, resulting in unstable training and making style transfer quality difficult to guarantee. In addition, the GAN generator is only compatible with one style, so a series of GANs must be trained to provide users with choices to transfer more than one kind of style. In this paper, we focus on tackling these challenges and limitations to improve style transfer. We propose adversarial gated networks (Gated GAN) to transfer multiple styles in a single model. The generative networks have three modules: an encoder, a gated transformer, and a decoder. Different styles can be achieved by passing input images through different branches of the gated transformer. To stabilize training, the encoder and decoder are combined as an autoencoder to reconstruct the input images. The discriminative networks are used to distinguish whether the input image is a stylized or genuine image. An auxiliary classifier is used to recognize the style categories of transferred images, thereby helping the generative networks generate images in multiple styles. In addition, Gated GAN makes it possible to explore a new style by investigating styles learned from artists or genres. Our extensive experiments demonstrate the stability and effectiveness of the proposed model for multistyle transfer.
Tasks	Style Transfer
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02296v1
PDF	http://arxiv.org/pdf/1904.02296v1.pdf
PWC	https://paperswithcode.com/paper/gated-gan-adversarial-gated-networks-for
Repo	https://github.com/colemiller94/gatedgan
Framework	pytorch

Attention-aware Multi-stroke Style Transfer


Title	Attention-aware Multi-stroke Style Transfer
Authors	Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, Jun Wang
Abstract	Neural style transfer has drawn considerable attention from both academic and industrial field. Although visual effect and efficiency have been significantly improved, existing methods are unable to coordinate spatial distribution of visual attention between the content image and stylized image, or render diverse level of detail via different brush strokes. In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model. We first propose to assemble self-attention mechanism into a style-agnostic reconstruction autoencoder framework, from which the attention map of a content image can be derived. By performing multi-scale style swap on content features and style features, we produce multiple feature maps reflecting different stroke patterns. A flexible fusion strategy is further presented to incorporate the salient characteristics from the attention map, which allows integrating multiple stroke patterns into different spatial regions of the output image harmoniously. We demonstrate the effectiveness of our method, as well as generate comparable stylized images with multiple stroke patterns against the state-of-the-art methods.
Tasks	Style Transfer
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05127v1
PDF	http://arxiv.org/pdf/1901.05127v1.pdf
PWC	https://paperswithcode.com/paper/attention-aware-multi-stroke-style-transfer
Repo	https://github.com/JianqiangRen/AAMS
Framework	tf

Polyglot Contextual Representations Improve Crosslingual Transfer


Title	Polyglot Contextual Representations Improve Crosslingual Transfer
Authors	Phoebe Mulcaire, Jungo Kasai, Noah A. Smith
Abstract	We introduce Rosita, a method to produce multilingual contextual word representations by training a single language model on text from multiple languages. Our method combines the advantages of contextual word representations with those of multilingual representation learning. We produce language models from dissimilar language pairs (English/Arabic and English/Chinese) and use them in dependency parsing, semantic role labeling, and named entity recognition, with comparisons to monolingual and non-contextual variants. Our results provide further evidence for the benefits of polyglot learning, in which representations are shared across multiple languages.
Tasks	Dependency Parsing, Language Modelling, Named Entity Recognition, Representation Learning, Semantic Role Labeling
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09697v2
PDF	http://arxiv.org/pdf/1902.09697v2.pdf
PWC	https://paperswithcode.com/paper/polyglot-contextual-representations-improve
Repo	https://github.com/pmulcaire/rosita
Framework	pytorch

Billion-scale semi-supervised learning for image classification


Title	Billion-scale semi-supervised learning for image classification
Authors	I. Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, Dhruv Mahajan
Abstract	This paper presents a study of semi-supervised learning with large convolutional networks. We propose a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images (up to 1 billion). Our main goal is to improve the performance for a given target architecture, like ResNet-50 or ResNext. We provide an extensive analysis of the success factors of our approach, which leads us to formulate some recommendations to produce high-accuracy models for image classification with semi-supervised learning. As a result, our approach brings important gains to standard architectures for image, video and fine-grained classification. For instance, by leveraging one billion unlabelled images, our learned vanilla ResNet-50 achieves 81.2% top-1 accuracy on the ImageNet benchmark.
Tasks	Image Classification, Video Classification
Published	2019-05-02
URL	http://arxiv.org/abs/1905.00546v1
PDF	http://arxiv.org/pdf/1905.00546v1.pdf
PWC	https://paperswithcode.com/paper/billion-scale-semi-supervised-learning-for
Repo	https://github.com/leaderj1001/Billion-scale-semi-supervised-learning
Framework	pytorch

HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning


Title	HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning
Authors	Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, Johan Bos
Abstract	Large crowdsourced datasets are widely used for training and evaluating neural models on natural language inference (NLI). Despite these efforts, neural models have a hard time capturing logical inferences, including those licensed by phrase replacements, so-called monotonicity reasoning. Since no large dataset has been developed for monotonicity reasoning, it is still unclear whether the main obstacle is the size of datasets or the model architectures themselves. To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena. We add it to training data for the state-of-the-art neural models and evaluate them on test sets for monotonicity phenomena. The results showed that our data augmentation improved the overall accuracy. We also find that the improvement is better on monotonicity inferences with lexical replacements than on downward inferences with disjunction and modification. This suggests that some types of inferences can be improved by our data augmentation while others are immune to it.
Tasks	Data Augmentation, Natural Language Inference
Published	2019-04-27
URL	http://arxiv.org/abs/1904.12166v1
PDF	http://arxiv.org/pdf/1904.12166v1.pdf
PWC	https://paperswithcode.com/paper/help-a-dataset-for-identifying-shortcomings
Repo	https://github.com/verypluming/HELP
Framework	none

Recommender Systems with Heterogeneous Side Information


Title	Recommender Systems with Heterogeneous Side Information
Authors	Tianqiao Liu, Zhiwei Wang, Jiliang Tang, Songfan Yang, Gale Yan Huang, Zitao Liu
Abstract	In modern recommender systems, both users and items are associated with rich side information, which can help understand users and items. Such information is typically heterogeneous and can be roughly categorized into flat and hierarchical side information. While side information has been proved to be valuable, the majority of existing systems have exploited either only flat side information or only hierarchical side information due to the challenges brought by the heterogeneity. In this paper, we investigate the problem of exploiting heterogeneous side information for recommendations. Specifically, we propose a novel framework jointly captures flat and hierarchical side information with mathematical coherence. We demonstrate the effectiveness of the proposed framework via extensive experiments on various real-world datasets. Empirical results show that our approach is able to lead a significant performance gain over the state-of-the-art methods.
Tasks	Recommendation Systems
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08679v1
PDF	https://arxiv.org/pdf/1907.08679v1.pdf
PWC	https://paperswithcode.com/paper/recommender-systems-with-heterogeneous-side
Repo	https://github.com/tal-ai/Recommender-Systems-with-Heterogeneous-Side-Information
Framework	none

Audio-Visual Model Distillation Using Acoustic Images


Title	Audio-Visual Model Distillation Using Acoustic Images
Authors	Andrés F. Pérez, Valentina Sanguineti, Pietro Morerio, Vittorio Murino
Abstract	In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality. Former models learn audio representations from raw signals or spectral data acquired by a single microphone, with remarkable results in classification and retrieval. However, such representations are not so robust towards variable environmental sound conditions. We tackle this drawback by exploiting a new multimodal labeled action recognition dataset acquired by a hybrid audio-visual sensor that provides RGB video, raw audio signals, and spatialized acoustic data, also known as acoustic images, where the visual and acoustic images are aligned in space and synchronized in time. Using this richer information, we train audio deep learning models in a teacher-student fashion. In particular, we distill knowledge into audio networks from both visual and acoustic image teachers. Our experiments suggest that the learned representations are more powerful and have better generalization capabilities than the features learned from models trained using just single-microphone audio data.
Tasks	Audio Classification, Temporal Action Localization
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07933v2
PDF	https://arxiv.org/pdf/1904.07933v2.pdf
PWC	https://paperswithcode.com/paper/audio-visual-model-distillation-using
Repo	https://github.com/afperezm/acoustic-images-distillation
Framework	tf

MPC-Net: A First Principles Guided Policy Search


Title	MPC-Net: A First Principles Guided Policy Search
Authors	Jan Carius, Farbod Farshidian, Marco Hutter
Abstract	We present an Imitation Learning approach for the control of dynamical systems with a known model. Our policy search method is guided by solutions from MPC. Typical policy search methods of this kind minimize a distance metric between the guiding demonstrations and the learned policy. Our loss function, however, corresponds to the minimization of the control Hamiltonian, which derives from the principle of optimality. Therefore, our algorithm directly attempts to solve the optimality conditions with a parameterized class of control laws. Additionally, the proposed loss function explicitly encodes the constraints of the optimal control problem and we provide numerical evidence that its minimization achieves improved constraint satisfaction. We train a mixture-of-expert neural network architecture for controlling a quadrupedal robot and show that this policy structure is well suited for such multimodal systems. The learned policy can successfully stabilize different gaits on the real walking robot from less than 10 min of demonstration data.
Tasks	Imitation Learning
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05197v2
PDF	https://arxiv.org/pdf/1909.05197v2.pdf
PWC	https://paperswithcode.com/paper/mpc-net-a-first-principles-guided-policy
Repo	https://github.com/leggedrobotics/MPC-Net
Framework	pytorch

Targeted sampling from massive Blockmodel graphs with personalized PageRank


Title	Targeted sampling from massive Blockmodel graphs with personalized PageRank
Authors	Fan Chen, Yini Zhang, Karl Rohe
Abstract	This paper provides statistical theory and intuition for Personalized PageRank (PPR), a popular technique that samples a small community from a massive network. We study a setting where the entire network is expensive to thoroughly obtain or maintain, but we can start from a seed node of interest and “crawl” the network to find other nodes through their connections. By crawling the graph in a designed way, the PPR vector can be approximated without querying the entire massive graph, making it an alternative to snowball sampling. Using the Degree-Corrected Stochastic Blockmodel, we study whether the PPR vector can select nodes that belong to the same block as the seed node. We provide a simple and interpretable form for the PPR vector, highlighting its biases towards high degree nodes outside of the target block. We examine a simple adjustment based on node degrees and establish consistency results for PPR clustering that allows for directed graphs. We illustrate the method with the Twitter friendship graph and find that (i) the adjusted and unadjusted PPR techniques are complementary approaches, where the adjustment makes the results particularly localized around the seed node and (ii) the bias adjustment greatly benefits from degree regularization.
Tasks
Published	2019-10-04
URL	https://arxiv.org/abs/1910.12937v1
PDF	https://arxiv.org/pdf/1910.12937v1.pdf
PWC	https://paperswithcode.com/paper/targeted-sampling-from-massive-blockmodel
Repo	https://github.com/RoheLab/aPPR
Framework	none

Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks


Title	Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
Authors	Saurabh Singh, Shankar Krishnan
Abstract	Batch Normalization (BN) uses mini-batch statistics to normalize the activations during training, introducing dependence between mini-batch elements. This dependency can hurt the performance if the mini-batch size is too small, or if the elements are correlated. Several alternatives, such as Batch Renormalization and Group Normalization (GN), have been proposed to address this issue. However, they either do not match the performance of BN for large batches, or still exhibit degradation in performance for smaller batches, or introduce artificial constraints on the model architecture. In this paper we propose the Filter Response Normalization (FRN) layer, a novel combination of a normalization and an activation function, that can be used as a replacement for other normalizations and activations. Our method operates on each activation channel of each batch element independently, eliminating the dependency on other batch elements. Our method outperforms BN and other alternatives in a variety of settings for all batch sizes. FRN layer performs $\approx 0.7-1.0%$ better than BN on top-1 validation accuracy with large mini-batch sizes for Imagenet classification using InceptionV3 and ResnetV2-50 architectures. Further, it performs $>1%$ better than GN on the same problem in the small mini-batch size regime. For object detection problem on COCO dataset, FRN layer outperforms all other methods by at least $0.3-0.5%$ in all batch size regimes.
Tasks	Image Classification, Object Detection
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09737v2
PDF	https://arxiv.org/pdf/1911.09737v2.pdf
PWC	https://paperswithcode.com/paper/filter-response-normalization-layer
Repo	https://github.com/CarloLepelaars/filter_response_normalization_keras
Framework	none

Highly Parallelized Data-driven MPC for Minimal Intervention Shared Control


Title	Highly Parallelized Data-driven MPC for Minimal Intervention Shared Control
Authors	Alexander Broad, Todd Murphey, Brenna Argall
Abstract	We present a shared control paradigm that improves a user’s ability to operate complex, dynamic systems in potentially dangerous environments without a priori knowledge of the user’s objective. In this paradigm, the role of the autonomous partner is to improve the general safety of the system without constraining the user’s ability to achieve unspecified behaviors. Our approach relies on a data-driven, model-based representation of the joint human-machine system to evaluate, in parallel, a significant number of potential inputs that the user may wish to provide. These samples are used to (1) predict the safety of the system over a receding horizon, and (2) minimize the influence of the autonomous partner. The resulting shared control algorithm maximizes the authority allocated to the human partner to improve their sense of agency, while improving safety. We evaluate the efficacy of our shared control algorithm with a human subjects study (n=20) conducted in two simulated environments: a balance bot and a race car. During the experiment, users are free to operate each system however they would like (i.e., there is no specified task) and are only asked to try to avoid unsafe regions of the state space. Using modern computational resources (i.e., GPUs) our approach is able to consider more than 10,000 potential trajectories at each time step in a control loop running at 100Hz for the balance bot and 60Hz for the race car. The results of the study show that our shared control paradigm improves system safety without knowledge of the user’s goal, while maintaining high-levels of user satisfaction and low-levels of frustration. Our code is available online at https://github.com/asbroad/mpmi_shared_control.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02318v1
PDF	https://arxiv.org/pdf/1906.02318v1.pdf
PWC	https://paperswithcode.com/paper/highly-parallelized-data-driven-mpc-for
Repo	https://github.com/asbroad/mpmi_shared_control
Framework	none

A Neural-Network-Based Model Predictive Control of Three-Phase Inverter With an Output LC Filter


Title	A Neural-Network-Based Model Predictive Control of Three-Phase Inverter With an Output LC Filter
Authors	Ihab S. Mohamed, Stefano Rovetta, Ton Duc Do, Tomislav Dragicevic, Ahmed A. Zaki Diab
Abstract	Model predictive control (MPC) has become one of the well-established modern control methods for three-phase inverters with an output LC filter, where a high-quality voltage with low total harmonic distortion (THD) is needed. Although it is an intuitive controller, easy to understand and implement, it has the significant disadvantage of requiring a large number of online calculations for solving the optimization problem. On the other hand, the application of model-free approaches such as those based on artificial neural networks approaches is currently growing rapidly in the area of power electronics and drives. This paper presents a new control scheme for a two-level converter based on combining MPC and feed-forward ANN, with the aim of getting lower THD and improving the steady and dynamic performance of the system for different types of loads. First, MPC is used, as an expert, in the training phase to generate data required for training the proposed neural network. Then, once the neural network is fine-tuned, it can be successfully used online for voltage tracking purpose, without the need of using MPC. The proposed ANN-based control strategy is validated through simulation, using MATLAB/Simulink tools, taking into account different loads conditions. Moreover, the performance of the ANN-based controller is evaluated, on several samples of linear and non-linear loads under various operating conditions, and compared to that of MPC, demonstrating the excellent steady-state and dynamic performance of the proposed ANN-based control strategy.
Tasks
Published	2019-02-22
URL	https://arxiv.org/abs/1902.09964v3
PDF	https://arxiv.org/pdf/1902.09964v3.pdf
PWC	https://paperswithcode.com/paper/a-neural-network-based-model-predictive
Repo	https://github.com/IhabMohamed/ANN-MPC
Framework	none

Relational Knowledge Distillation


Title	Relational Knowledge Distillation
Authors	Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho
Abstract	Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers’ performance, achieving the state of the arts on standard benchmark datasets.
Tasks	Metric Learning
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05068v2
PDF	http://arxiv.org/pdf/1904.05068v2.pdf
PWC	https://paperswithcode.com/paper/relational-knowledge-distillation
Repo	https://github.com/lenscloth/RKD
Framework	pytorch

Stabilizing Transformers for Reinforcement Learning


Title	Stabilizing Transformers for Reinforcement Learning
Authors	Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell
Abstract	Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer’s ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve the stability and learning speed of the original Transformer and XL variant. The proposed architecture, the Gated Transformer-XL (GTrXL), surpasses LSTMs on challenging memory environments and achieves state-of-the-art results on the multi-task DMLab-30 benchmark suite, exceeding the performance of an external memory architecture. We show that the GTrXL, trained using the same losses, has stability and performance that consistently matches or exceeds a competitive LSTM baseline, including on more reactive tasks where memory is less critical. GTrXL offers an easy-to-train, simple-to-implement but substantially more expressive architectural alternative to the standard multi-layer LSTM ubiquitously used for RL agents in partially observable environments.
Tasks	Language Modelling, Machine Translation
Published	2019-10-13
URL	https://arxiv.org/abs/1910.06764v1
PDF	https://arxiv.org/pdf/1910.06764v1.pdf
PWC	https://paperswithcode.com/paper/stabilizing-transformers-for-reinforcement-1
Repo	https://github.com/jdenalil/Gated-Transformer-XL
Framework	pytorch


Title	Hateful People or Hateful Bots? Detection and Characterization of Bots Spreading Religious Hatred in Arabic Social Media
Authors	Nuha Albadi, Maram Kurdi, Shivakant Mishra
Abstract	Arabic Twitter space is crawling with bots that fuel political feuds, spread misinformation, and proliferate sectarian rhetoric. While efforts have long existed to analyze and detect English bots, Arabic bot detection and characterization remains largely understudied. In this work, we contribute new insights into the role of bots in spreading religious hatred on Arabic Twitter and introduce a novel regression model that can accurately identify Arabic language bots. Our assessment shows that existing tools that are highly accurate in detecting English bots don’t perform as well on Arabic bots. We identify the possible reasons for this poor performance, perform a thorough analysis of linguistic, content, behavioral and network features, and report on the most informative features that distinguish Arabic bots from humans as well as the differences between Arabic and English bots. Our results mark an important step toward understanding the behavior of malicious bots on Arabic Twitter and pave the way for a more effective Arabic bot detection tools.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00153v2
PDF	https://arxiv.org/pdf/1908.00153v2.pdf
PWC	https://paperswithcode.com/paper/hateful-people-or-hateful-bots-detection-and
Repo	https://github.com/nuhaalbadi/ArabicBots
Framework	none