October 16, 2019

3594 words 17 mins read

Paper Group ANR 1017

The Newton Scheme for Deep Learning. DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging. Disentangling Latent Factors of Variational Auto-Encoder with Whitening. Learning to Walk via Deep Reinforcement Learning. Style Separation and Synthesis via Generative Adversarial Networks. Two geometric input transformation methods for …

The Newton Scheme for Deep Learning


Title	The Newton Scheme for Deep Learning
Authors	Junqing Qiu, Guoren Zhong, Yihua Lu, Kun Xin, Huihuan Qian, Xi Zhu
Abstract	We introduce a neural network (NN) strictly governed by Newton’s Law, with the nature required basis functions derived from the fundamental classic mechanics. Then, by classifying the training model as a quick procedure of ‘force pattern’ recognition, we developed the Newton physics-based NS scheme. Once the force pattern is confirmed, the neuro network simply does the checking of the ‘pattern stability’ instead of the continuous fitting by computational resource consuming big data-driven processing. In the given physics’s law system, once the field is confirmed, the mathematics bases for the force field description actually are not diverged but denumerable, which can save the function representations from the exhaustible available mathematics bases. In this work, we endorsed Newton’s Law into the deep learning technology and proposed Newton Scheme (NS). Under NS, the user first identifies the path pattern, like the constant acceleration movement.The object recognition technology first loads mass information, then, the NS finds the matched physical pattern and describe and predict the trajectory of the movements with nearly zero error. We compare the major contribution of this NS with the TCN, GRU and other physics inspired ‘FIND-PDE’ methods to demonstrate fundamental and extended applications of how the NS works for the free-falling, pendulum and curve soccer balls.The NS methodology provides more opportunity for the future deep learning advances.
Tasks	Object Recognition
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07550v1
PDF	http://arxiv.org/pdf/1810.07550v1.pdf
PWC	https://paperswithcode.com/paper/the-newton-scheme-for-deep-learning
Repo
Framework

DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging


Title	DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging
Authors	Senthil Mani, Anush Sankaran, Rahul Aralikatte
Abstract	For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, with the bug title and description as the input, mapping it to one of the available developers (classes). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data noisy. The existing bag-of-words (BOW) feature models do not consider the syntactical and sequential word information available in the unstructured text. We propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based bug representation is then used for training the classifier. Using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (~70% bugs in an open source bug tracking system) are leveraged, which were completely ignored in the previous studies. Another contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium (383,104 bug reports), Mozilla Core (314,388 bug reports), and Mozilla Firefox (162,307 bug reports). Experimentally we compare our approach with BOW model and machine learning approaches and observe that DBRNN-A provides a higher rank-10 average accuracy.
Tasks
Published	2018-01-04
URL	http://arxiv.org/abs/1801.01275v1
PDF	http://arxiv.org/pdf/1801.01275v1.pdf
PWC	https://paperswithcode.com/paper/deeptriage-exploring-the-effectiveness-of
Repo
Framework

Disentangling Latent Factors of Variational Auto-Encoder with Whitening


Title	Disentangling Latent Factors of Variational Auto-Encoder with Whitening
Authors	Sangchul Hahn, Heeyoul Choi
Abstract	After deep generative models were successfully applied to image generation tasks, learning disentangled latent variables of data has become a crucial part of deep generative model research. Many models have been proposed to learn an interpretable and factorized representation of latent variable by modifying their objective function or model architecture. To disentangle the latent variable, some models show lower quality of reconstructed images and others increase the model complexity which is hard to train. In this paper, we propose a simple disentangling method based on a traditional whitening process. The proposed method is applied to the latent variables of variational auto-encoder (VAE), although it can be applied to any generative models with latent variables. In experiment, we apply the proposed method to simple VAE models and experiment results confirm that our method finds more interpretable factors from the latent space while keeping the reconstruction error the same as the conventional VAE’s error.
Tasks	Image Generation
Published	2018-11-08
URL	https://arxiv.org/abs/1811.03444v2
PDF	https://arxiv.org/pdf/1811.03444v2.pdf
PWC	https://paperswithcode.com/paper/disentangling-latent-factors-with-whitening
Repo
Framework

Learning to Walk via Deep Reinforcement Learning


Title	Learning to Walk via Deep Reinforcement Learning
Authors	Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine
Abstract	Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is exceptionally difficult, primarily due to poor sample complexity and sensitivity to hyperparameters. While hyperparameters can be easily tuned in simulated domains, tuning may be prohibitively expensive on physical systems, such as legged robots, that can be damaged through extensive trial-and-error learning. In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies. We apply this method to learning walking gaits on a real-world Minitaur robot. Our method can acquire a stable gait from scratch directly in the real world in about two hours, without relying on any model or simulation, and the resulting policy is robust to moderate variations in the environment. We further show that our algorithm achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Videos of training and the learned policy can be found on the project website.
Tasks	Legged Robots
Published	2018-12-26
URL	https://arxiv.org/abs/1812.11103v3
PDF	https://arxiv.org/pdf/1812.11103v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-walk-via-deep-reinforcement
Repo
Framework

Style Separation and Synthesis via Generative Adversarial Networks


Title	Style Separation and Synthesis via Generative Adversarial Networks
Authors	Rui Zhang, Sheng Tang, Yu Li, Junbo Guo, Yongdong Zhang, Jintao Li, Shuicheng Yan
Abstract	Style synthesis attracts great interests recently, while few works focus on its dual problem “style separation”. In this paper, we propose the Style Separation and Synthesis Generative Adversarial Network (S3-GAN) to simultaneously implement style separation and style synthesis on object photographs of specific categories. Based on the assumption that the object photographs lie on a manifold, and the contents and styles are independent, we employ S3-GAN to build mappings between the manifold and a latent vector space for separating and synthesizing the contents and styles. The S3-GAN consists of an encoder network, a generator network, and an adversarial network. The encoder network performs style separation by mapping an object photograph to a latent vector. Two halves of the latent vector represent the content and style, respectively. The generator network performs style synthesis by taking a concatenated vector as input. The concatenated vector contains the style half vector of the style target image and the content half vector of the content target image. Once obtaining the images from the generator network, an adversarial network is imposed to generate more photo-realistic images. Experiments on CelebA and UT Zappos 50K datasets demonstrate that the S3-GAN has the capacity of style separation and synthesis simultaneously, and could capture various styles in a single model.
Tasks
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02740v1
PDF	http://arxiv.org/pdf/1811.02740v1.pdf
PWC	https://paperswithcode.com/paper/style-separation-and-synthesis-via-generative
Repo
Framework

Two geometric input transformation methods for fast online reinforcement learning with neural nets


Title	Two geometric input transformation methods for fast online reinforcement learning with neural nets
Authors	Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton
Abstract	We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node’s learning behavior. We propose reducing such interferences with two efficient input transformation methods that are geometric in nature and match well the geometric property of ReLU gates. The first one is tile coding, a classic binary encoding scheme originally designed for local generalization based on the topological structure of the input space. The second one (EmECS) is a new method we introduce; it is based on geometric properties of convex sets and topological embedding of the input space into the boundary of a convex set. We discuss the behavior of the network when it operates on the transformed inputs. We also compare it experimentally with some neural nets that do not use the same input transformations, and with the classic algorithm of tile coding plus a linear function approximator, and on several online reinforcement learning tasks, we show that the neural net with tile coding or EmECS can achieve not only faster learning but also more accurate approximations. Our results strongly suggest that geometric input transformation of this type can be effective for interference reduction and takes us a step closer to fully incremental reinforcement learning with neural nets.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07476v2
PDF	http://arxiv.org/pdf/1805.07476v2.pdf
PWC	https://paperswithcode.com/paper/two-geometric-input-transformation-methods
Repo
Framework

Speech and Speaker Recognition from Raw Waveform with SincNet


Title	Speech and Speaker Recognition from Raw Waveform with SincNet
Authors	Mirco Ravanelli, Yoshua Bengio
Abstract	Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. This paper summarizes our recent efforts to develop a neural architecture that efficiently processes speech from audio waveforms. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more computationally efficient than standard CNNs.
Tasks	Speaker Recognition, Speech Recognition
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05920v1
PDF	http://arxiv.org/pdf/1812.05920v1.pdf
PWC	https://paperswithcode.com/paper/speech-and-speaker-recognition-from-raw
Repo
Framework

Large-scale Distance Metric Learning with Uncertainty


Title	Large-scale Distance Metric Learning with Uncertainty
Authors	Qi Qian, Jiasheng Tang, Hao Li, Shenghuo Zhu, Rong Jin
Abstract	Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which makes it challenging for DML to handle the large-scale data set. Besides, the real-world data may contain various uncertainty, especially for the image data. The uncertainty can mislead the learning procedure and cause the performance degradation. By investigating the image data, we find that the original data can be observed from a small set of clean latent examples with different distortions. In this work, we propose the margin preserving metric learning framework to learn the distance metric and latent examples simultaneously. By leveraging the ideal properties of latent examples, the training efficiency can be improved significantly while the learned metric also becomes robust to the uncertainty in the original data. Furthermore, we can show that the metric is learned from latent examples only, but it can preserve the large margin property even for the original data. The empirical study on the benchmark image data sets demonstrates the efficacy and efficiency of the proposed method.
Tasks	Metric Learning
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10384v1
PDF	http://arxiv.org/pdf/1805.10384v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-distance-metric-learning-with
Repo
Framework

Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features


Title	Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features
Authors	Taimoor Tariq, Munchurl Kim
Abstract	Pre-trained Deep Convolutional Neural Network (CNN) features have popularly been used as full-reference perceptual quality features for CNN based image quality assessment, super-resolution, image restoration and a variety of image-to-image translation problems. In this paper, to get more insight, we link basic human visual perception to characteristics of learned deep CNN representations as a novel and first attempt to interpret them. We characterize the frequency and orientation tuning of channels in trained object detection deep CNNs (e.g., VGG-16) by applying grating stimuli of different spatial frequencies and orientations as input. We observe that the behavior of CNN channels as spatial frequency and orientation selective filters can be used to link basic human visual perception models to their characteristics. Doing so, we develop a theory to get more insight into deep CNN representations as perceptual quality features. We conclude that sensitivity to spatial frequencies that have lower contrast masking thresholds in human visual perception and a definite and strong orientation selectivity are important attributes of deep CNN channels that deliver better perceptual quality features.
Tasks	Image Quality Assessment, Image Restoration, Image-to-Image Translation, Object Detection, Super-Resolution
Published	2018-12-02
URL	https://arxiv.org/abs/1812.00412v3
PDF	https://arxiv.org/pdf/1812.00412v3.pdf
PWC	https://paperswithcode.com/paper/a-psychovisual-analysis-on-deep-cnn-features
Repo
Framework

A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart


Title	A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart
Authors	Alvaro Ulloa, Linyuan Jing, Christopher W Good, David P vanMaanen, Sushravya Raghunath, Jonathan D Suever, Christopher D Nevius, Gregory J Wehner, Dustin Hartzel, Joseph B Leader, Amro Alsaid, Aalpen A Patel, H Lester Kirchner, Marios S Pattichis, Christopher M Haggerty, Brandon K Fornwalt
Abstract	Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions.
Tasks	Mortality Prediction
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10553v2
PDF	https://arxiv.org/pdf/1811.10553v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-neural-network-predicts-survival-after
Repo
Framework

Unified Graph based Multi-Cue Feature Fusion for Robust Visual Tracking


Title	Unified Graph based Multi-Cue Feature Fusion for Robust Visual Tracking
Authors	Kapil Sharma, Himanshu Ahuja, Ashish Kumar, Nipun Bansal, Gurjit Singh Walia
Abstract	Visual Tracking is a complex problem due to unconstrained appearance variations and dynamic environment. Extraction of complementary information from the object environment via multiple features and adaption to the target’s appearance variations are the key problems of this work. To this end, we propose a robust object tracking framework based on Unified Graph Fusion (UGF) of multi-cue to adapt to the object’s appearance. The proposed cross-diffusion of sparse and dense features not only suppresses the individual feature deficiencies but also extracts the complementary information from multi-cue. This iterative process builds robust unified features which are invariant to object deformations, fast motion, and occlusion. Robustness of the unified feature also enables the random forest classifier to precisely distinguish the foreground from the background, adding resilience to background clutter. In addition, we present a novel kernel-based adaptation strategy using outlier detection and a transductive reliability metric.
Tasks	Object Tracking, Outlier Detection, Visual Tracking
Published	2018-12-16
URL	https://arxiv.org/abs/1812.06407v3
PDF	https://arxiv.org/pdf/1812.06407v3.pdf
PWC	https://paperswithcode.com/paper/unified-graph-based-multi-cue-feature-fusion
Repo
Framework

Unsupervised Learning of Neural Networks to Explain Neural Networks


Title	Unsupervised Learning of Neural Networks to Explain Neural Networks
Authors	Quanshi Zhang, Yu Yang, Yuchen Liu, Ying Nian Wu, Song-Chun Zhu
Abstract	This paper presents an unsupervised method to learn a neural network, namely an explainer, to interpret a pre-trained convolutional neural network (CNN), i.e., explaining knowledge representations hidden in middle conv-layers of the CNN. Given feature maps of a certain conv-layer of the CNN, the explainer performs like an auto-encoder, which first disentangles the feature maps into object-part features and then inverts object-part features back to features of higher conv-layers of the CNN. More specifically, the explainer contains interpretable conv-layers, where each filter disentangles the representation of a specific object part from chaotic input feature maps. As a paraphrase of CNN features, the disentangled representations of object parts help people understand the logic inside the CNN. We also learn the explainer to use object-part features to reconstruct features of higher CNN layers, in order to minimize loss of information during the feature disentanglement. More crucially, we learn the explainer via network distillation without using any annotations of sample labels, object parts, or textures for supervision. We have applied our method to different types of CNNs for evaluation, and explainers have significantly boosted the interpretability of CNN features.
Tasks
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07468v1
PDF	http://arxiv.org/pdf/1805.07468v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-neural-networks-to-1
Repo
Framework

Multiple Object Tracking in Urban Traffic Scenes with a Multiclass Object Detector


Title	Multiple Object Tracking in Urban Traffic Scenes with a Multiclass Object Detector
Authors	Hui-Lee Ooi, Guillaume-Alexandre Bilodeau, Nicolas Saunier, David-Alexandre Beaupré
Abstract	Multiple object tracking (MOT) in urban traffic aims to produce the trajectories of the different road users that move across the field of view with different directions and speeds and that can have varying appearances and sizes. Occlusions and interactions among the different objects are expected and common due to the nature of urban road traffic. In this work, a tracking framework employing classification label information from a deep learning detection approach is used for associating the different objects, in addition to object position and appearances. We want to investigate the performance of a modern multiclass object detector for the MOT task in traffic scenes. Results show that the object labels improve tracking performance, but that the output of object detectors are not always reliable.
Tasks	Multiple Object Tracking, Object Tracking
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02073v1
PDF	http://arxiv.org/pdf/1809.02073v1.pdf
PWC	https://paperswithcode.com/paper/multiple-object-tracking-in-urban-traffic
Repo
Framework

Learning The Sequential Temporal Information with Recurrent Neural Networks


Title	Learning The Sequential Temporal Information with Recurrent Neural Networks
Authors	Pushparaja Murugan
Abstract	Recurrent Networks are one of the most powerful and promising artificial neural network algorithms to processing the sequential data such as natural languages, sound, time series data. Unlike traditional feed-forward network, Recurrent Network has a inherent feed back loop that allows to store the temporal context information and pass the state of information to the entire sequences of the events. This helps to achieve the state of art performance in many important tasks such as language modeling, stock market prediction, image captioning, speech recognition, machine translation and object tracking etc., However, training the fully connected RNN and managing the gradient flow are the complicated process. Many studies are carried out to address the mentioned limitation. This article is intent to provide the brief details about recurrent neurons, its variances and trips & tricks to train the fully recurrent neural network. This review work is carried out as a part of our IPO studio software module ‘Multiple Object Tracking’.
Tasks	Image Captioning, Language Modelling, Machine Translation, Multiple Object Tracking, Object Tracking, Speech Recognition, Stock Market Prediction, Time Series
Published	2018-07-08
URL	http://arxiv.org/abs/1807.02857v1
PDF	http://arxiv.org/pdf/1807.02857v1.pdf
PWC	https://paperswithcode.com/paper/learning-the-sequential-temporal-information
Repo
Framework

Mitigating Sybils in Federated Learning Poisoning


Title	Mitigating Sybils in Federated Learning Poisoning
Authors	Clement Fung, Chris J. M. Yoon, Ivan Beschastnikh
Abstract	Machine learning (ML) over distributed multi-party data is required for a variety of domains. Existing approaches, such as federated learning, collect the outputs computed by a group of devices at a central aggregator and run iterative algorithms to train a globally shared model. Unfortunately, such approaches are susceptible to a variety of attacks, including model poisoning, which is made substantially worse in the presence of sybils. In this paper we first evaluate the vulnerability of federated learning to sybil-based poisoning attacks. We then describe \emph{FoolsGold}, a novel defense to this problem that identifies poisoning sybils based on the diversity of client updates in the distributed learning process. Unlike prior work, our system does not bound the expected number of attackers, requires no auxiliary information outside of the learning process, and makes fewer assumptions about clients and their data. In our evaluation we show that FoolsGold exceeds the capabilities of existing state of the art approaches to countering sybil-based label-flipping and backdoor poisoning attacks. Our results hold for different distributions of client data, varying poisoning targets, and various sybil strategies. Code can be found at: https://github.com/DistributedML/FoolsGold
Tasks
Published	2018-08-14
URL	https://arxiv.org/abs/1808.04866v4
PDF	https://arxiv.org/pdf/1808.04866v4.pdf
PWC	https://paperswithcode.com/paper/mitigating-sybils-in-federated-learning
Repo
Framework