February 1, 2020

3328 words 16 mins read

Paper Group AWR 304

Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs. Only sparsity based loss function for learning representations. Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks. Learning Parallax Attention for Stereo Image Super-Resolution. adVAE: A self-adversarial variational auto …

Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs


Title	Federated Learning with Additional Mechanisms on Clients to Reduce Communication Costs
Authors	Xin Yao, Tianchi Huang, Chenglei Wu, Rui-Xiao Zhang, Lifeng Sun
Abstract	Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT (Internet of Things) devices. However, the leading optimization algorithm in such settings, i.e., federated averaging (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is distributed in a non-IID way. To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the on-device training. The first (FedMMD) is adopting a two-stream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices. Experiments show that the proposed method outperforms baselines, especially in non-IID FL settings, with a reduction of more than 20% in required communication rounds. The second is FL with feature fusion (FedFusion). By aggregating the features from both the local and global models, we achieve higher accuracy at fewer communication costs. Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence. Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60%.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05891v2
PDF	https://arxiv.org/pdf/1908.05891v2.pdf
PWC	https://paperswithcode.com/paper/federated-learning-with-additional-mechanisms
Repo	https://github.com/thu-media/FedFusion
Framework	pytorch

Only sparsity based loss function for learning representations


Title	Only sparsity based loss function for learning representations
Authors	Vivek Bakaraju, Kishore Reddy Konda
Abstract	We study the emergence of sparse representations in neural networks. We show that in unsupervised models with regularization, the emergence of sparsity is the result of the input data samples being distributed along highly non-linear or discontinuous manifold. We also derive a similar argument for discriminatively trained networks and present experiments to support this hypothesis. Based on our study of sparsity, we introduce a new loss function which can be used as regularization term for models like autoencoders and MLPs. Further, the same loss function can also be used as a cost function for an unsupervised single-layered neural network model for learning efficient representations.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02893v1
PDF	http://arxiv.org/pdf/1903.02893v1.pdf
PWC	https://paperswithcode.com/paper/only-sparsity-based-loss-function-for
Repo	https://github.com/Vivek-B/OVR
Framework	tf

Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks


Title	Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks
Authors	Yochai Zur, Chaim Baskin, Evgenii Zheltonozhskii, Brian Chmiel, Itay Evron, Alex M. Bronstein, Avi Mendelson
Abstract	Recently, deep learning has become a de facto standard in machine learning with convolutional neural networks (CNNs) demonstrating spectacular success on a wide variety of tasks. However, CNNs are typically very demanding computationally at inference time. One of the ways to alleviate this burden on certain hardware platforms is quantization relying on the use of low-precision arithmetic representation for the weights and the activations. Another popular method is the pruning of the number of filters in each layer. While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training. In this paper, we formulate optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy. We use a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al. (arXiv:1806.09055). We show, by grid search, that heterogeneous quantized networks suffer from a high variance which renders the benefit of the search questionable. For pruning, improvement over homogeneous cases is possible, but it is still challenging to find those configurations with the proposed method. The code is publicly available at https://github.com/yochaiz/Slimmable and https://github.com/yochaiz/darts-UNIQ
Tasks	Network Pruning, Neural Architecture Search, Quantization
Published	2019-04-22
URL	https://arxiv.org/abs/1904.09872v4
PDF	https://arxiv.org/pdf/1904.09872v4.pdf
PWC	https://paperswithcode.com/paper/towards-learning-of-filter-level
Repo	https://github.com/yochaiz/darts-UNIQ
Framework	pytorch

Learning Parallax Attention for Stereo Image Super-Resolution


Title	Learning Parallax Attention for Stereo Image Super-Resolution
Authors	Longguang Wang, Yingqian Wang, Zhengfa Liang, Zaiping Lin, Jungang Yang, Wei An, Yulan Guo
Abstract	Stereo image pairs can be used to improve the performance of super-resolution (SR) since additional information is provided from a second viewpoint. However, it is challenging to incorporate this information for SR since disparities between stereo images vary significantly. In this paper, we propose a parallax-attention stereo superresolution network (PASSRnet) to integrate the information from a stereo image pair for SR. Specifically, we introduce a parallax-attention mechanism with a global receptive field along the epipolar line to handle different stereo images with large disparity variations. We also propose a new and the largest dataset for stereo image SR (namely, Flickr1024). Extensive experiments demonstrate that the parallax-attention mechanism can capture correspondence between stereo images to improve SR performance with a small computational and memory cost. Comparative results show that our PASSRnet achieves the state-of-the-art performance on the Middlebury, KITTI 2012 and KITTI 2015 datasets.
Tasks	Image Super-Resolution, Stereo Image Super-Resolution, Super-Resolution
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05784v3
PDF	http://arxiv.org/pdf/1903.05784v3.pdf
PWC	https://paperswithcode.com/paper/learning-parallax-attention-for-stereo-image
Repo	https://github.com/YingqianWang/Flickr1024
Framework	none

adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection


Title	adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection
Authors	Xuhong Wang, Ying Du, Shijie Lin, Ping Cui, Yuntian Shen, Yupu Yang
Abstract	Recently, deep generative models have become increasingly popular in unsupervised anomaly detection. However, deep generative models aim at recovering the data distribution rather than detecting anomalies. Besides, deep generative models have the risk of overfitting training samples, which has disastrous effects on anomaly detection performance. To solve the above two problems, we propose a Self-adversarial Variational Autoencoder with a Gaussian anomaly prior assumption. We assume that both the anomalous and the normal prior distribution are Gaussian and have overlaps in the latent space. Therefore, a Gaussian transformer net T is trained to synthesize anomalous but near-normal latent variables. Keeping the original training objective of Variational Autoencoder, besides, the generator G tries to distinguish between the normal latent variables and the anomalous ones synthesized by T, and the encoder E is trained to discriminate whether the output of G is real. These new objectives we added not only give both G and E the ability to discriminate but also introduce additional regularization to prevent overfitting. Compared with the SOTA baselines, the proposed model achieves significant improvements in extensive experiments. Datasets and our model are available at a Github repository.
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2019-03-03
URL	https://arxiv.org/abs/1903.00904v3
PDF	https://arxiv.org/pdf/1903.00904v3.pdf
PWC	https://paperswithcode.com/paper/self-adversarial-variational-autoencoder-with
Repo	https://github.com/YeongHyeon/adVAE
Framework	tf

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue


Title	DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
Authors	Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu
Abstract	Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects, relationships or semantics. The key challenge in Visual Dialogue task is thus to learn a more comprehensive and semantic-rich image representation which may have adaptive attentions on the image for variant questions. In this research, we propose a novel model to depict an image from both visual and semantic perspectives. Specifically, the visual view helps capture the appearance-level information, including objects and their relationships, while the semantic view enables the agent to understand high-level visual semantics from the whole image to the local regions. Futhermore, on top of such multi-view image features, we propose a feature selection framework which is able to adaptively capture question-relevant information hierarchically in fine-grained level. The proposed method achieved state-of-the-art results on benchmark Visual Dialogue datasets. More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values. It gives us insights in understanding of human cognition in Visual Dialogue.
Tasks	Feature Selection, Question Answering, Visual Dialog, Visual Question Answering
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07251v1
PDF	https://arxiv.org/pdf/1911.07251v1.pdf
PWC	https://paperswithcode.com/paper/dualvd-an-adaptive-dual-encoding-model-for
Repo	https://github.com/JXZe/DualVD
Framework	pytorch

A parallel Fortran framework for neural networks and deep learning


Title	A parallel Fortran framework for neural networks and deep learning
Authors	Milan Curcic
Abstract	This paper describes neural-fortran, a parallel Fortran framework for neural networks and deep learning. It features a simple interface to construct feed-forward neural networks of arbitrary structure and size, several activation functions, and stochastic gradient descent as the default optimization algorithm. Neural-fortran also leverages the Fortran 2018 standard collective subroutines to achieve data-based parallelism on shared- or distributed-memory machines. First, I describe the implementation of neural networks with Fortran derived types, whole-array arithmetic, and collective sum and broadcast operations to achieve parallelism. Second, I demonstrate the use of neural-fortran in an example of recognizing hand-written digits from images. Finally, I evaluate the computational performance in both serial and parallel modes. Ease of use and computational performance are similar to an existing popular machine learning framework, making neural-fortran a viable candidate for further development and use in production.
Tasks
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06714v2
PDF	http://arxiv.org/pdf/1902.06714v2.pdf
PWC	https://paperswithcode.com/paper/a-parallel-fortran-framework-for-neural
Repo	https://github.com/milancurcic/neural-fortran-paper
Framework	tf

BINet: Multi-perspective Business Process Anomaly Classification


Title	BINet: Multi-perspective Business Process Anomaly Classification
Authors	Timo Nolle, Stefan Luettgen, Alexander Seeliger, Max Mühlhäuser
Abstract	In this paper, we introduce BINet, a neural network architecture for real-time multi-perspective anomaly detection in business process event logs. BINet is designed to handle both the control flow and the data perspective of a business process. Additionally, we propose a set of heuristics for setting the threshold of an anomaly detection algorithm automatically. We demonstrate that BINet can be used to detect anomalies in event logs not only on a case level but also on event attribute level. Finally, we demonstrate that a simple set of rules can be used to utilize the output of BINet for anomaly classification. We compare BINet to eight other state-of-the-art anomaly detection algorithms and evaluate their performance on an elaborate data corpus of 29 synthetic and 15 real-life event logs. BINet outperforms all other methods both on the synthetic as well as on the real-life datasets.
Tasks	Anomaly Detection
Published	2019-02-08
URL	http://arxiv.org/abs/1902.03155v1
PDF	http://arxiv.org/pdf/1902.03155v1.pdf
PWC	https://paperswithcode.com/paper/binet-multi-perspective-business-process
Repo	https://github.com/tnolle/binet
Framework	tf

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog


Title	Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Authors	Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Abstract	Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. These are critical shortcomings for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment – e.g. systems that learn from human interaction. Thus, we develop a novel class of off-policy batch RL algorithms, which are able to effectively learn offline, without exploring, from a fixed batch of human interaction data. We leverage models pre-trained on data as a strong prior, and use KL-control to penalize divergence from this prior during RL training. We also use dropout-based uncertainty estimates to lower bound the target Q-values as a more efficient alternative to Double Q-Learning. The algorithms are tested on the problem of open-domain dialog generation – a challenging reinforcement learning problem with a 20,000-dimensional action space. Using our Way Off-Policy algorithm, we can extract multiple different reward functions post-hoc from collected human interaction data, and learn effectively from all of these. We test the real-world generalization of these systems by deploying them live to converse with humans in an open-domain setting, and demonstrate that our algorithm achieves significant improvements over prior methods in off-policy batch RL.
Tasks	Q-Learning
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00456v2
PDF	https://arxiv.org/pdf/1907.00456v2.pdf
PWC	https://paperswithcode.com/paper/way-off-policy-batch-deep-reinforcement
Repo	https://github.com/natashamjaques/neural_chat
Framework	pytorch

Towards Empathic Deep Q-Learning


Title	Towards Empathic Deep Q-Learning
Authors	Bart Bussmann, Jacqueline Heinerman, Joel Lehman
Abstract	As reinforcement learning (RL) scales to solve increasingly complex tasks, interest continues to grow in the fields of AI safety and machine ethics. As a contribution to these fields, this paper introduces an extension to Deep Q-Networks (DQNs), called Empathic DQN, that is loosely inspired both by empathy and the golden rule (“Do unto others as you would have them do unto you”). Empathic DQN aims to help mitigate negative side effects to other agents resulting from myopic goal-directed behavior. We assume a setting where a learning agent coexists with other independent agents (who receive unknown rewards), where some types of reward (e.g. negative rewards from physical harm) may generalize across agents. Empathic DQN combines the typical (self-centered) value with the estimated value of other agents, by imagining (by its own standards) the value of it being in the other’s situation (by considering constructed states where both agents are swapped). Proof-of-concept results in two gridworld environments highlight the approach’s potential to decrease collateral harms. While extending Empathic DQN to complex environments is non-trivial, we believe that this first step highlights the potential of bridge-work between machine ethics and RL to contribute useful priors for norm-abiding RL agents.
Tasks	Q-Learning
Published	2019-06-26
URL	https://arxiv.org/abs/1906.10918v1
PDF	https://arxiv.org/pdf/1906.10918v1.pdf
PWC	https://paperswithcode.com/paper/towards-empathic-deep-q-learning
Repo	https://github.com/bartbussmann/EmpathicDQN
Framework	none

A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry


Title	A Story of Two Streams: Reinforcement Learning Models from Human Behavior and Neuropsychiatry
Authors	Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Jenna Reinen, Irina Rish
Abstract	Drawing an inspiration from behavioral studies of human decision making, we propose here a more general and flexible parametric framework for reinforcement learning that extends standard Q-learning to a two-stream model for processing positive and negative rewards, and allows to incorporate a wide range of reward-processing biases – an important component of human decision making which can help us better understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems, as well as various neuropsychiatric conditions associated with disruptions in normal reward processing. From the computational perspective, we observe that the proposed Split-QL model and its clinically inspired variants consistently outperform standard Q-Learning and SARSA methods, as well as recently proposed Double Q-Learning approaches, on simulated tasks with particular reward distributions, a real-world dataset capturing human decision-making in gambling tasks, and the Pac-Man game in a lifelong learning setting across different reward stationarities.
Tasks	Decision Making, Q-Learning, Recommendation Systems
Published	2019-06-21
URL	https://arxiv.org/abs/1906.11286v6
PDF	https://arxiv.org/pdf/1906.11286v6.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-models-of-human
Repo	https://github.com/doerlbh/mentalRL
Framework	pytorch

Efficient Model-free Reinforcement Learning in Metric Spaces


Title	Efficient Model-free Reinforcement Learning in Metric Spaces
Authors	Zhao Song, Wen Sun
Abstract	Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15]. Recently, equipped with the idea of optimism in the face of uncertainty, Q-learning algorithms [Jin, Allen-Zhu, Bubeck, Jordan 18] can be proven to be sample efficient for discrete tabular Markov Decision Processes (MDPs) which have finite number of states and actions. In this work, we present an efficient model-free Q-learning based algorithm in MDPs with a natural metric on the state-action space–hence extending efficient model-free Q-learning algorithms to continuous state-action space. Compared to previous model-based RL algorithms for metric spaces [Kakade, Kearns, Langford 03], our algorithm does not require access to a black-box planning oracle.
Tasks	Q-Learning
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00475v1
PDF	http://arxiv.org/pdf/1905.00475v1.pdf
PWC	https://paperswithcode.com/paper/efficient-model-free-reinforcement-learning-1
Repo	https://github.com/seanrsinclair/AdaptiveQLearning
Framework	none

Automatic vocal tract landmark localization from midsagittal MRI data


Title	Automatic vocal tract landmark localization from midsagittal MRI data
Authors	Mohammad Eslami, Christiane Neuschaefer-Rube, Antoine Serrurier
Abstract	The various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.
Tasks	Face Alignment, Pose Estimation
Published	2019-07-18
URL	https://arxiv.org/abs/1907.07951v2
PDF	https://arxiv.org/pdf/1907.07951v2.pdf
PWC	https://paperswithcode.com/paper/automatic-vocal-tract-landmark-localization
Repo	https://github.com/mohaEs/Train-Predict-Landmarks-by-flat-net
Framework	tf

Transfer Learning for Image-Based Malware Classification


Title	Transfer Learning for Image-Based Malware Classification
Authors	Niket Bhodia, Pratikkumar Prajapati, Fabio Di Troia, Mark Stamp
Abstract	In this paper, we consider the problem of malware detection and classification based on image analysis. We convert executable files to images and apply image recognition using deep learning (DL) models. To train these models, we employ transfer learning based on existing DL models that have been pre-trained on massive image datasets. We carry out various experiments with this technique and compare its performance to that of an extremely simple machine learning technique, namely, k-nearest neighbors (\kNN). For our k-NN experiments, we use features extracted directly from executables, rather than image analysis. While our image-based DL technique performs well in the experiments, surprisingly, it is outperformed by k-NN. We show that DL models are better able to generalize the data, in the sense that they outperform k-NN in simulated zero-day experiments.
Tasks	Malware Classification, Malware Detection, Transfer Learning
Published	2019-01-21
URL	http://arxiv.org/abs/1903.11551v1
PDF	http://arxiv.org/pdf/1903.11551v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-image-based-malware
Repo	https://github.com/pratikpv/malware_classification
Framework	none

Estimating Attention Flow in Online Video Networks


Title	Estimating Attention Flow in Online Video Networks
Authors	Siqi Wu, Marian-Andrei Rizoiu, Lexing Xie
Abstract	Online videos have shown tremendous increase in Internet traffic. Most video hosting sites implement recommender systems, which connect the videos into a directed network and conceptually act as a source of pathways for users to navigate. At present, little is known about how human attention is allocated over such large-scale networks, and about the impacts of the recommender systems. In this paper, we first construct the Vevo network – a YouTube video network with 60,740 music videos interconnected by the recommendation links, and we collect their associated viewing dynamics. This results in a total of 310 million views every day over a period of 9 weeks. Next, we present large-scale measurements that connect the structure of the recommendation network and the video attention dynamics. We use the bow-tie structure to characterize the Vevo network and we find that its core component (23.1% of the videos), which occupies most of the attention (82.6% of the views), is made out of videos that are mainly recommended among themselves. This is indicative of the links between video recommendation and the inequality of attention allocation. Finally, we address the task of estimating the attention flow in the video recommendation network. We propose a model that accounts for the network effects for predicting video popularity, and we show it consistently outperforms the baselines. This model also identifies a group of artists gaining attention because of the recommendation network. Altogether, our observations and our models provide a new set of tools to better understand the impacts of recommender systems on collective social attention.
Tasks	Recommendation Systems
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07123v3
PDF	https://arxiv.org/pdf/1908.07123v3.pdf
PWC	https://paperswithcode.com/paper/estimating-attention-flow-in-online-video
Repo	https://github.com/avalanchesiqi/networked-popularity
Framework	none