January 28, 2020

3437 words 17 mins read

Paper Group ANR 783

Deep Adversarial Belief Networks. Data Augmentation for Deep Learning-based Radio Modulation Classification. Agent Modeling as Auxiliary Task for Deep Reinforcement Learning. Direct Automatic Coronary Calcium Scoring in Cardiac and Chest CT. Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation. Wideni …

Deep Adversarial Belief Networks


Title	Deep Adversarial Belief Networks
Authors	Yuming Huang, Ashkan Panahi, Hamid Krim, Yiyi Yu, Spencer L. Smith
Abstract	We present a novel adversarial framework for training deep belief networks (DBNs), which includes replacing the generator network in the methodology of generative adversarial networks (GANs) with a DBN and developing a highly parallelizable numerical algorithm for training the resulting architecture in a stochastic manner. Unlike the existing techniques, this framework can be applied to the most general form of DBNs with no requirement for back propagation. As such, it lays a new foundation for developing DBNs on a par with GANs with various regularization units, such as pooling and normalization. Foregoing back-propagation, our framework also exhibits superior scalability as compared to other DBN and GAN learning techniques. We present a number of numerical experiments in computer vision as well as neurosciences to illustrate the main advantages of our approach.
Tasks
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06134v2
PDF	https://arxiv.org/pdf/1909.06134v2.pdf
PWC	https://paperswithcode.com/paper/deep-adversarial-belief-networks
Repo
Framework

Data Augmentation for Deep Learning-based Radio Modulation Classification


Title	Data Augmentation for Deep Learning-based Radio Modulation Classification
Authors	Liang Huang, Weijian Pan, You Zhang, LiPing Qian, Nan Gao, Yuan Wu
Abstract	Deep learning has recently been applied to automatically classify the modulation categories of received radio signals without manual experience. However, training deep learning models requires massive volume of data. An insufficient training data will cause serious overfitting problem and degrade the classification accuracy. To cope with small dataset, data augmentation has been widely used in image processing to expand the dataset and improve the robustness of deep learning models. However, in wireless communication areas, the effect of different data augmentation methods on radio modulation classification has not been studied yet. In this paper, we evaluate different data augmentation methods via a state-of-the-art deep learning-based modulation classifier. Based on the characteristics of modulated signals, three augmentation methods are considered, i.e., rotation, flip, and Gaussian noise, which can be applied in both training phase and inference phase of the deep learning algorithm. Numerical results show that all three augmentation methods can improve the classification accuracy. Among which, the rotation augmentation method outperforms the flip method, both of which achieve higher classification accuracy than the Gaussian noise method. Given only 12.5% of training dataset, a joint rotation and flip augmentation policy can achieve even higher classification accuracy than the baseline with initial 100% training dataset without augmentation. Furthermore, with data augmentation, radio modulation categories can be successfully classified using shorter radio samples, leading to a simplified deep learning model and shorter the classification response time.
Tasks	Data Augmentation
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03026v2
PDF	https://arxiv.org/pdf/1912.03026v2.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-for-deep-learning-based
Repo
Framework

Agent Modeling as Auxiliary Task for Deep Reinforcement Learning


Title	Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
Authors	Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor
Abstract	In this paper we explore how actor-critic methods in deep reinforcement learning, in particular Asynchronous Advantage Actor-Critic (A3C), can be extended with agent modeling. Inspired by recent works on representation learning and multiagent deep reinforcement learning, we propose two architectures to perform agent modeling: the first one based on parameter sharing, and the second one based on agent policy features. Both architectures aim to learn other agents’ policies as auxiliary tasks, besides the standard actor (policy) and critic (values). We performed experiments in both cooperative and competitive domains. The former is a problem of coordinated multiagent object transportation and the latter is a two-player mini version of the Pommerman game. Our results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards.
Tasks	Representation Learning
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09597v1
PDF	https://arxiv.org/pdf/1907.09597v1.pdf
PWC	https://paperswithcode.com/paper/agent-modeling-as-auxiliary-task-for-deep
Repo
Framework

Direct Automatic Coronary Calcium Scoring in Cardiac and Chest CT


Title	Direct Automatic Coronary Calcium Scoring in Cardiac and Chest CT
Authors	Bob D. de Vos, Jelmer M. Wolterink, Tim Leiner, Pim A. de Jong, Nikolas Lessmann, Ivana Isgum
Abstract	Cardiovascular disease (CVD) is the global leading cause of death. A strong risk factor for CVD events is the amount of coronary artery calcium (CAC). To meet demands of the increasing interest in quantification of CAC, i.e. coronary calcium scoring, especially as an unrequested finding for screening and research, automatic methods have been proposed. Current automatic calcium scoring methods are relatively computationally expensive and only provide scores for one type of CT. To address this, we propose a computationally efficient method that employs two ConvNets: the first performs registration to align the fields of view of input CTs and the second performs direct regression of the calcium score, thereby circumventing time-consuming intermediate CAC segmentation. Optional decision feedback provides insight in the regions that contributed to the calcium score. Experiments were performed using 903 cardiac CT and 1,687 chest CT scans. The method predicted calcium scores in less than 0.3 s. Intra-class correlation coefficient between predicted and manual calcium scores was 0.98 for both cardiac and chest CT. The method showed almost perfect agreement between automatic and manual CVD risk categorization in both datasets, with a linearly weighted Cohen’s kappa of 0.95 in cardiac CT and 0.93 in chest CT. Performance is similar to that of state-of-the-art methods, but the proposed method is hundreds of times faster. By providing visual feedback, insight is given in the decision process, making it readily implementable in clinical and research settings.
Tasks
Published	2019-02-12
URL	http://arxiv.org/abs/1902.05408v1
PDF	http://arxiv.org/pdf/1902.05408v1.pdf
PWC	https://paperswithcode.com/paper/direct-automatic-coronary-calcium-scoring-in
Repo
Framework

Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation


Title	Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Authors	Mitchell A. Gordon, Kevin Duh
Abstract	Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train smaller models? We test the common hypothesis that SLKD addresses a capacity deficiency in students by “simplifying” noisy data points and find it unlikely in our case. Models trained on concatenations of original and “simplified” datasets generalize just as well as baseline SLKD. We then propose an alternative hypothesis under the lens of data augmentation and regularization. We try various augmentation strategies and observe that dropout regularization can become unnecessary. Our methods achieve BLEU gains of 0.7-1.2 on TED Talks.
Tasks	Data Augmentation, Machine Translation, Model Compression
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03334v1
PDF	https://arxiv.org/pdf/1912.03334v1.pdf
PWC	https://paperswithcode.com/paper/explaining-sequence-level-knowledge
Repo
Framework

Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts


Title	Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts
Authors	Denis Emelin, Ivan Titov, Rico Sennrich
Abstract	The transformer is a state-of-the-art neural translation model that uses attention to iteratively refine lexical representations with information drawn from the surrounding context. Lexical features are fed into the first layer and propagated through a deep network of hidden layers. We argue that the need to represent and propagate lexical features in each layer limits the model’s capacity for learning and representing other information relevant to the task. To alleviate this bottleneck, we introduce gated shortcut connections between the embedding layer and each subsequent layer within the encoder and decoder. This enables the model to access relevant lexical content dynamically, without expending limited resources on storing it within intermediate states. We show that the proposed modification yields consistent improvements over a baseline transformer on standard WMT translation tasks in 5 translation directions (0.9 BLEU on average) and reduces the amount of lexical information passed along the hidden layers. We furthermore evaluate different ways to integrate lexical connections into the transformer architecture and present ablation experiments exploring the effect of proposed shortcuts on model behavior.
Tasks	Machine Translation
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12284v1
PDF	https://arxiv.org/pdf/1906.12284v1.pdf
PWC	https://paperswithcode.com/paper/widening-the-representation-bottleneck-in
Repo
Framework

SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong


Title	SPIN: A High Speed, High Resolution Vision Dataset for Tracking and Action Recognition in Ping Pong
Authors	Steven Schwarcz, Peng Xu, David D’Ambrosio, Juhana Kangaspunta, Anelia Angelova, Huong Phan, Navdeep Jaitly
Abstract	We introduce a new high resolution, high frame rate stereo video dataset, which we call SPIN, for tracking and action recognition in the game of ping pong. The corpus consists of ping pong play with three main annotation streams that can be used to learn tracking and action recognition models – tracking of the ping pong ball and poses of humans in the videos and the spin of the ball being hit by humans. The training corpus consists of 53 hours of data with labels derived from previous models in a semi-supervised method. The testing corpus contains 1 hour of data with the same information, except that crowd compute was used to obtain human annotations of the ball position, from which ball spin has been derived. Along with the dataset we introduce several baseline models that were trained on this data. The models were specifically chosen to be able to perform inference at the same rate as the images are generated – specifically 150 fps. We explore the advantages of multi-task training on this data, and also show interesting properties of ping pong ball trajectories that are derived from our observational data, rather than from prior physics models. To our knowledge this is the first large scale dataset of ping pong; we offer it to the community as a rich dataset that can be used for a large variety of machine learning and vision tasks such as tracking, pose estimation, semi-supervised and unsupervised learning and generative modeling.
Tasks	Pose Estimation
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06640v1
PDF	https://arxiv.org/pdf/1912.06640v1.pdf
PWC	https://paperswithcode.com/paper/spin-a-high-speed-high-resolution-vision
Repo
Framework

Predicting Variable Types in Dynamically Typed Programming Languages


Title	Predicting Variable Types in Dynamically Typed Programming Languages
Authors	Abhinav Jangda, Gaurav Anand
Abstract	Dynamic Programming Languages are quite popular because they increase the programmer’s productivity. However, the absence of types in the source code makes the program written in these languages difficult to understand and virtual machines that execute these programs cannot produced optimized code. To overcome this challenge, we develop a technique to predict types of all identifiers including variables, and function return types. We propose the first implementation of $2^{nd}$ order Inside Outside Recursive Neural Networks with two variants (i) Child-Sum Tree-LSTMs and (ii) N-ary RNNs that can handle large number of tree branching. We predict the types of all the identifiers given the Abstract Syntax Tree by performing just two passes over the tree, bottom-up and top-down, keeping both the content and context representation for all the nodes of the tree. This allows these representations to interact by combining different paths from the parent, siblings and children which is crucial for predicting types. Our best model achieves 44.33% across 21 classes and top-3 accuracy of 71.5% on our gathered Python data set from popular Python benchmarks.
Tasks
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05138v1
PDF	http://arxiv.org/pdf/1901.05138v1.pdf
PWC	https://paperswithcode.com/paper/predicting-variable-types-in-dynamically
Repo
Framework

On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks


Title	On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Authors	Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun
Abstract	The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the \emph{generalized} CLT, which suggests that the GN converges to a \emph{heavy-tailed} $\alpha$-stable random vector, where \emph{tail-index} $\alpha$ determines the heavy-tailedness of the distribution. Accordingly, we propose to analyze SGD as a discretization of an SDE driven by a L'{e}vy motion. Such SDEs can incur `jumps’, which force the SDE and its discretization \emph{transition} from narrow minima to wider minima, as proven by existing metastability theory and the extensions that we proved recently. In this study, under the $\alpha$-stable GN assumption, we further establish an explicit connection between the convergence rate of SGD to a local minimum and the tail-index $\alpha$. To validate the $\alpha$-stable assumption, we conduct experiments on common deep learning scenarios and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima. \|
Tasks
Published	2019-11-29
URL	https://arxiv.org/abs/1912.00018v1
PDF	https://arxiv.org/pdf/1912.00018v1.pdf
PWC	https://paperswithcode.com/paper/on-the-heavy-tailed-theory-of-stochastic
Repo
Framework

Finding Robust Itemsets Under Subsampling


Title	Finding Robust Itemsets Under Subsampling
Authors	Nikolaj Tatti, Fabian Moerchen, Toon Calders
Abstract	Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: if an itemset is closed, free, non-derivable or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-$k$ approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets.
Tasks
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06743v2
PDF	http://arxiv.org/pdf/1902.06743v2.pdf
PWC	https://paperswithcode.com/paper/finding-robust-itemsets-under-subsampling
Repo
Framework

Let’s Get Dirty: GAN Based Data Augmentation for Soiling and Adverse Weather Classification in Autonomous Driving


Title	Let’s Get Dirty: GAN Based Data Augmentation for Soiling and Adverse Weather Classification in Autonomous Driving
Authors	Michal Uricar, Ganesh Sistu, Hazem Rashed, Antonin Vobecky, Pavel Krizek, Fabian Burger, Senthil Yogamani
Abstract	Cameras are getting more and more important in autonomous driving. Wide-angle fisheye cameras are relatively cheap sensors and very suitable for automated parking and low-speed navigation tasks. Four of such cameras form a surround-view system that provides a complete and detailed view around the vehicle. These cameras are usually directly exposed to harsh environmental settings and therefore can get soiled very easily by mud, dust, water, frost, etc. The soiling on the camera lens has a direct impact on the further processing of the images they provide. While adverse weather conditions, such as rain, are getting attention recently, there is limited work on lens soiling. We believe that one of the reasons is that it is difficult to build a diverse dataset for this task, which is moreover expensive to annotate. We propose a novel GAN based algorithm for generating artificial soiling data along with the corresponding annotation masks. The manually annotated soiling dataset and the generated augmentation dataset will be made public. We demonstrate the generalization of our fisheye trained soiling GAN model on the Cityscapes dataset. Additionally, we provide an empirical evaluation of the degradation of the semantic segmentation algorithm with the soiled data.
Tasks	Autonomous Driving, Data Augmentation, Semantic Segmentation
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02249v1
PDF	https://arxiv.org/pdf/1912.02249v1.pdf
PWC	https://paperswithcode.com/paper/lets-get-dirty-gan-based-data-augmentation
Repo
Framework

Unsupervised Representation Learning of DNA Sequences


Title	Unsupervised Representation Learning of DNA Sequences
Authors	Vishal Agarwal, N Jayanth Kumar Reddy, Ashish Anand
Abstract	Recently several deep learning models have been used for DNA sequence based classification tasks. Often such tasks require long and variable length DNA sequences in the input. In this work, we use a sequence-to-sequence autoencoder model to learn a latent representation of a fixed dimension for long and variable length DNA sequences in an unsupervised manner. We evaluate both quantitatively and qualitatively the learned latent representation for a supervised task of splice site classification. The quantitative evaluation is done under two different settings. Our experiments show that these representations can be used as features or priors in closely related tasks such as splice site classification. Further, in our qualitative analysis, we use a model attribution technique Integrated Gradients to infer significant sequence signatures influencing the classification accuracy. We show the identified splice signatures resemble well with the existing knowledge.
Tasks	Representation Learning, Unsupervised Representation Learning
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03087v1
PDF	https://arxiv.org/pdf/1906.03087v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-learning-of-dna
Repo
Framework

Incorporating System-Level Objectives into Recommender Systems


Title	Incorporating System-Level Objectives into Recommender Systems
Authors	Himan Abdollahpouri
Abstract	One of the most essential parts of any recommender system is personalization– how acceptable the recommendations are from the user’s perspective. However, in many real-world applications, there are other stakeholders whose needs and interests should be taken into account. In this work, we define the problem of multistakeholder recommendation and we focus on finding algorithms for a special case where the recommender system itself is also a stakeholder. In addition, we will explore the idea of incremental incorporation of system-level objectives into recommender systems over time to tackle the existing problems in the optimization techniques which only look for optimizing the individual users’ lists.
Tasks	Recommendation Systems
Published	2019-05-31
URL	https://arxiv.org/abs/1906.01435v1
PDF	https://arxiv.org/pdf/1906.01435v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-system-level-objectives-into
Repo
Framework

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach


Title	Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach
Authors	Haibo Yang, Xin Zhang, Minghong Fang, Jia Liu
Abstract	In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04532v1
PDF	https://arxiv.org/pdf/1909.04532v1.pdf
PWC	https://paperswithcode.com/paper/byzantine-resilient-stochastic-gradient
Repo
Framework

Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter


Title	Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter
Authors	Chi Zhang, Bryan Wilkinson, Ashwinkumar Ganesan, Tim Oates
Abstract	Denial of Service (DoS) attacks are common in on-line and mobile services such as Twitter, Facebook and banking. As the scale and frequency of Distributed Denial of Service (DDoS) attacks increase, there is an urgent need for determining the impact of the attack. Two central challenges of the task are to get feedback from a large number of users and to get it in a timely manner. In this paper, we present a weakly-supervised model that does not need annotated data to measure the impact of DoS issues by applying Latent Dirichlet Allocation and symmetric Kullback-Leibler divergence on tweets. There is a limitation to the weakly-supervised module. It assumes that the event detected in a time window is a DoS attack event. This will become less of a problem, when more non-attack events twitter got collected and become less likely to be identified as a new event. Another way to remove that limitation, an optional classification layer, trained on manually annotated DoS attack tweets, to filter out non-attack tweets can be used to increase precision at the expense of recall. Experimental results show that we can learn weakly-supervised models that can achieve comparable precision to supervised ones and can be generalized across entities in the same industry.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05890v1
PDF	https://arxiv.org/pdf/1909.05890v1.pdf
PWC	https://paperswithcode.com/paper/determining-the-scale-of-impact-from-denial
Repo
Framework