Paper Group ANR 1017
The Newton Scheme for Deep Learning. DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging. Disentangling Latent Factors of Variational Auto-Encoder with Whitening. Learning to Walk via Deep Reinforcement Learning. Style Separation and Synthesis via Generative Adversarial Networks. Two geometric input transformation methods for …
The Newton Scheme for Deep Learning
Title | The Newton Scheme for Deep Learning |
Authors | Junqing Qiu, Guoren Zhong, Yihua Lu, Kun Xin, Huihuan Qian, Xi Zhu |
Abstract | We introduce a neural network (NN) strictly governed by Newton’s Law, with the nature required basis functions derived from the fundamental classic mechanics. Then, by classifying the training model as a quick procedure of ‘force pattern’ recognition, we developed the Newton physics-based NS scheme. Once the force pattern is confirmed, the neuro network simply does the checking of the ‘pattern stability’ instead of the continuous fitting by computational resource consuming big data-driven processing. In the given physics’s law system, once the field is confirmed, the mathematics bases for the force field description actually are not diverged but denumerable, which can save the function representations from the exhaustible available mathematics bases. In this work, we endorsed Newton’s Law into the deep learning technology and proposed Newton Scheme (NS). Under NS, the user first identifies the path pattern, like the constant acceleration movement.The object recognition technology first loads mass information, then, the NS finds the matched physical pattern and describe and predict the trajectory of the movements with nearly zero error. We compare the major contribution of this NS with the TCN, GRU and other physics inspired ‘FIND-PDE’ methods to demonstrate fundamental and extended applications of how the NS works for the free-falling, pendulum and curve soccer balls.The NS methodology provides more opportunity for the future deep learning advances. |
Tasks | Object Recognition |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07550v1 |
http://arxiv.org/pdf/1810.07550v1.pdf | |
PWC | https://paperswithcode.com/paper/the-newton-scheme-for-deep-learning |
Repo | |
Framework | |
DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging
Title | DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging |
Authors | Senthil Mani, Anush Sankaran, Rahul Aralikatte |
Abstract | For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of a bug triaging process. A bug title (summary) and a detailed description is present in most of the bug tracking systems. Automatic bug triaging algorithm can be formulated as a classification problem, with the bug title and description as the input, mapping it to one of the available developers (classes). The major challenge is that the bug description usually contains a combination of free unstructured text, code snippets, and stack trace making the input data noisy. The existing bag-of-words (BOW) feature models do not consider the syntactical and sequential word information available in the unstructured text. We propose a novel bug report representation algorithm using an attention based deep bidirectional recurrent neural network (DBRNN-A) model that learns a syntactic and semantic feature from long word sequences in an unsupervised manner. Instead of BOW features, the DBRNN-A based bug representation is then used for training the classifier. Using an attention mechanism enables the model to learn the context representation over a long word sequence, as in a bug report. To provide a large amount of data to learn the feature learning model, the unfixed bug reports (~70% bugs in an open source bug tracking system) are leveraged, which were completely ignored in the previous studies. Another contribution is to make this research reproducible by making the source code available and creating a public benchmark dataset of bug reports from three open source bug tracking system: Google Chromium (383,104 bug reports), Mozilla Core (314,388 bug reports), and Mozilla Firefox (162,307 bug reports). Experimentally we compare our approach with BOW model and machine learning approaches and observe that DBRNN-A provides a higher rank-10 average accuracy. |
Tasks | |
Published | 2018-01-04 |
URL | http://arxiv.org/abs/1801.01275v1 |
http://arxiv.org/pdf/1801.01275v1.pdf | |
PWC | https://paperswithcode.com/paper/deeptriage-exploring-the-effectiveness-of |
Repo | |
Framework | |
Disentangling Latent Factors of Variational Auto-Encoder with Whitening
Title | Disentangling Latent Factors of Variational Auto-Encoder with Whitening |
Authors | Sangchul Hahn, Heeyoul Choi |
Abstract | After deep generative models were successfully applied to image generation tasks, learning disentangled latent variables of data has become a crucial part of deep generative model research. Many models have been proposed to learn an interpretable and factorized representation of latent variable by modifying their objective function or model architecture. To disentangle the latent variable, some models show lower quality of reconstructed images and others increase the model complexity which is hard to train. In this paper, we propose a simple disentangling method based on a traditional whitening process. The proposed method is applied to the latent variables of variational auto-encoder (VAE), although it can be applied to any generative models with latent variables. In experiment, we apply the proposed method to simple VAE models and experiment results confirm that our method finds more interpretable factors from the latent space while keeping the reconstruction error the same as the conventional VAE’s error. |
Tasks | Image Generation |
Published | 2018-11-08 |
URL | https://arxiv.org/abs/1811.03444v2 |
https://arxiv.org/pdf/1811.03444v2.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-latent-factors-with-whitening |
Repo | |
Framework | |
Learning to Walk via Deep Reinforcement Learning
Title | Learning to Walk via Deep Reinforcement Learning |
Authors | Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine |
Abstract | Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is exceptionally difficult, primarily due to poor sample complexity and sensitivity to hyperparameters. While hyperparameters can be easily tuned in simulated domains, tuning may be prohibitively expensive on physical systems, such as legged robots, that can be damaged through extensive trial-and-error learning. In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies. We apply this method to learning walking gaits on a real-world Minitaur robot. Our method can acquire a stable gait from scratch directly in the real world in about two hours, without relying on any model or simulation, and the resulting policy is robust to moderate variations in the environment. We further show that our algorithm achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Videos of training and the learned policy can be found on the project website. |
Tasks | Legged Robots |
Published | 2018-12-26 |
URL | https://arxiv.org/abs/1812.11103v3 |
https://arxiv.org/pdf/1812.11103v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-walk-via-deep-reinforcement |
Repo | |
Framework | |
Style Separation and Synthesis via Generative Adversarial Networks
Title | Style Separation and Synthesis via Generative Adversarial Networks |
Authors | Rui Zhang, Sheng Tang, Yu Li, Junbo Guo, Yongdong Zhang, Jintao Li, Shuicheng Yan |
Abstract | Style synthesis attracts great interests recently, while few works focus on its dual problem “style separation”. In this paper, we propose the Style Separation and Synthesis Generative Adversarial Network (S3-GAN) to simultaneously implement style separation and style synthesis on object photographs of specific categories. Based on the assumption that the object photographs lie on a manifold, and the contents and styles are independent, we employ S3-GAN to build mappings between the manifold and a latent vector space for separating and synthesizing the contents and styles. The S3-GAN consists of an encoder network, a generator network, and an adversarial network. The encoder network performs style separation by mapping an object photograph to a latent vector. Two halves of the latent vector represent the content and style, respectively. The generator network performs style synthesis by taking a concatenated vector as input. The concatenated vector contains the style half vector of the style target image and the content half vector of the content target image. Once obtaining the images from the generator network, an adversarial network is imposed to generate more photo-realistic images. Experiments on CelebA and UT Zappos 50K datasets demonstrate that the S3-GAN has the capacity of style separation and synthesis simultaneously, and could capture various styles in a single model. |
Tasks | |
Published | 2018-11-07 |
URL | http://arxiv.org/abs/1811.02740v1 |
http://arxiv.org/pdf/1811.02740v1.pdf | |
PWC | https://paperswithcode.com/paper/style-separation-and-synthesis-via-generative |
Repo | |
Framework | |
Two geometric input transformation methods for fast online reinforcement learning with neural nets
Title | Two geometric input transformation methods for fast online reinforcement learning with neural nets |
Authors | Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton |
Abstract | We apply neural nets with ReLU gates in online reinforcement learning. Our goal is to train these networks in an incremental manner, without the computationally expensive experience replay. By studying how individual neural nodes behave in online training, we recognize that the global nature of ReLU gates can cause undesirable learning interference in each node’s learning behavior. We propose reducing such interferences with two efficient input transformation methods that are geometric in nature and match well the geometric property of ReLU gates. The first one is tile coding, a classic binary encoding scheme originally designed for local generalization based on the topological structure of the input space. The second one (EmECS) is a new method we introduce; it is based on geometric properties of convex sets and topological embedding of the input space into the boundary of a convex set. We discuss the behavior of the network when it operates on the transformed inputs. We also compare it experimentally with some neural nets that do not use the same input transformations, and with the classic algorithm of tile coding plus a linear function approximator, and on several online reinforcement learning tasks, we show that the neural net with tile coding or EmECS can achieve not only faster learning but also more accurate approximations. Our results strongly suggest that geometric input transformation of this type can be effective for interference reduction and takes us a step closer to fully incremental reinforcement learning with neural nets. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07476v2 |
http://arxiv.org/pdf/1805.07476v2.pdf | |
PWC | https://paperswithcode.com/paper/two-geometric-input-transformation-methods |
Repo | |
Framework | |
Speech and Speaker Recognition from Raw Waveform with SincNet
Title | Speech and Speaker Recognition from Raw Waveform with SincNet |
Authors | Mirco Ravanelli, Yoshua Bengio |
Abstract | Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. This paper summarizes our recent efforts to develop a neural architecture that efficiently processes speech from audio waveforms. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more computationally efficient than standard CNNs. |
Tasks | Speaker Recognition, Speech Recognition |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05920v1 |
http://arxiv.org/pdf/1812.05920v1.pdf | |
PWC | https://paperswithcode.com/paper/speech-and-speaker-recognition-from-raw |
Repo | |
Framework | |
Large-scale Distance Metric Learning with Uncertainty
Title | Large-scale Distance Metric Learning with Uncertainty |
Authors | Qi Qian, Jiasheng Tang, Hao Li, Shenghuo Zhu, Rong Jin |
Abstract | Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which makes it challenging for DML to handle the large-scale data set. Besides, the real-world data may contain various uncertainty, especially for the image data. The uncertainty can mislead the learning procedure and cause the performance degradation. By investigating the image data, we find that the original data can be observed from a small set of clean latent examples with different distortions. In this work, we propose the margin preserving metric learning framework to learn the distance metric and latent examples simultaneously. By leveraging the ideal properties of latent examples, the training efficiency can be improved significantly while the learned metric also becomes robust to the uncertainty in the original data. Furthermore, we can show that the metric is learned from latent examples only, but it can preserve the large margin property even for the original data. The empirical study on the benchmark image data sets demonstrates the efficacy and efficiency of the proposed method. |
Tasks | Metric Learning |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10384v1 |
http://arxiv.org/pdf/1805.10384v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-distance-metric-learning-with |
Repo | |
Framework | |
Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features
Title | Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features |
Authors | Taimoor Tariq, Munchurl Kim |
Abstract | Pre-trained Deep Convolutional Neural Network (CNN) features have popularly been used as full-reference perceptual quality features for CNN based image quality assessment, super-resolution, image restoration and a variety of image-to-image translation problems. In this paper, to get more insight, we link basic human visual perception to characteristics of learned deep CNN representations as a novel and first attempt to interpret them. We characterize the frequency and orientation tuning of channels in trained object detection deep CNNs (e.g., VGG-16) by applying grating stimuli of different spatial frequencies and orientations as input. We observe that the behavior of CNN channels as spatial frequency and orientation selective filters can be used to link basic human visual perception models to their characteristics. Doing so, we develop a theory to get more insight into deep CNN representations as perceptual quality features. We conclude that sensitivity to spatial frequencies that have lower contrast masking thresholds in human visual perception and a definite and strong orientation selectivity are important attributes of deep CNN channels that deliver better perceptual quality features. |
Tasks | Image Quality Assessment, Image Restoration, Image-to-Image Translation, Object Detection, Super-Resolution |
Published | 2018-12-02 |
URL | https://arxiv.org/abs/1812.00412v3 |
https://arxiv.org/pdf/1812.00412v3.pdf | |
PWC | https://paperswithcode.com/paper/a-psychovisual-analysis-on-deep-cnn-features |
Repo | |
Framework | |
A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart
Title | A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart |
Authors | Alvaro Ulloa, Linyuan Jing, Christopher W Good, David P vanMaanen, Sushravya Raghunath, Jonathan D Suever, Christopher D Nevius, Gregory J Wehner, Dustin Hartzel, Joseph B Leader, Amro Alsaid, Aalpen A Patel, H Lester Kirchner, Marios S Pattichis, Christopher M Haggerty, Brandon K Fornwalt |
Abstract | Predicting future clinical events helps physicians guide appropriate intervention. Machine learning has tremendous promise to assist physicians with predictions based on the discovery of complex patterns from historical data, such as large, longitudinal electronic health records (EHR). This study is a first attempt to demonstrate such capabilities using raw echocardiographic videos of the heart. We show that a large dataset of 723,754 clinically-acquired echocardiographic videos (~45 million images) linked to longitudinal follow-up data in 27,028 patients can be used to train a deep neural network to predict 1-year mortality with good accuracy (area under the curve (AUC) in an independent test set = 0.839). Prediction accuracy was further improved by adding EHR data (AUC = 0.858). Finally, we demonstrate that the trained neural network was more accurate in mortality prediction than two expert cardiologists. These results highlight the potential of neural networks to add new power to clinical predictions. |
Tasks | Mortality Prediction |
Published | 2018-11-26 |
URL | https://arxiv.org/abs/1811.10553v2 |
https://arxiv.org/pdf/1811.10553v2.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-neural-network-predicts-survival-after |
Repo | |
Framework | |
Unified Graph based Multi-Cue Feature Fusion for Robust Visual Tracking
Title | Unified Graph based Multi-Cue Feature Fusion for Robust Visual Tracking |
Authors | Kapil Sharma, Himanshu Ahuja, Ashish Kumar, Nipun Bansal, Gurjit Singh Walia |
Abstract | Visual Tracking is a complex problem due to unconstrained appearance variations and dynamic environment. Extraction of complementary information from the object environment via multiple features and adaption to the target’s appearance variations are the key problems of this work. To this end, we propose a robust object tracking framework based on Unified Graph Fusion (UGF) of multi-cue to adapt to the object’s appearance. The proposed cross-diffusion of sparse and dense features not only suppresses the individual feature deficiencies but also extracts the complementary information from multi-cue. This iterative process builds robust unified features which are invariant to object deformations, fast motion, and occlusion. Robustness of the unified feature also enables the random forest classifier to precisely distinguish the foreground from the background, adding resilience to background clutter. In addition, we present a novel kernel-based adaptation strategy using outlier detection and a transductive reliability metric. |
Tasks | Object Tracking, Outlier Detection, Visual Tracking |
Published | 2018-12-16 |
URL | https://arxiv.org/abs/1812.06407v3 |
https://arxiv.org/pdf/1812.06407v3.pdf | |
PWC | https://paperswithcode.com/paper/unified-graph-based-multi-cue-feature-fusion |
Repo | |
Framework | |
Unsupervised Learning of Neural Networks to Explain Neural Networks
Title | Unsupervised Learning of Neural Networks to Explain Neural Networks |
Authors | Quanshi Zhang, Yu Yang, Yuchen Liu, Ying Nian Wu, Song-Chun Zhu |
Abstract | This paper presents an unsupervised method to learn a neural network, namely an explainer, to interpret a pre-trained convolutional neural network (CNN), i.e., explaining knowledge representations hidden in middle conv-layers of the CNN. Given feature maps of a certain conv-layer of the CNN, the explainer performs like an auto-encoder, which first disentangles the feature maps into object-part features and then inverts object-part features back to features of higher conv-layers of the CNN. More specifically, the explainer contains interpretable conv-layers, where each filter disentangles the representation of a specific object part from chaotic input feature maps. As a paraphrase of CNN features, the disentangled representations of object parts help people understand the logic inside the CNN. We also learn the explainer to use object-part features to reconstruct features of higher CNN layers, in order to minimize loss of information during the feature disentanglement. More crucially, we learn the explainer via network distillation without using any annotations of sample labels, object parts, or textures for supervision. We have applied our method to different types of CNNs for evaluation, and explainers have significantly boosted the interpretability of CNN features. |
Tasks | |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07468v1 |
http://arxiv.org/pdf/1805.07468v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-neural-networks-to-1 |
Repo | |
Framework | |
Multiple Object Tracking in Urban Traffic Scenes with a Multiclass Object Detector
Title | Multiple Object Tracking in Urban Traffic Scenes with a Multiclass Object Detector |
Authors | Hui-Lee Ooi, Guillaume-Alexandre Bilodeau, Nicolas Saunier, David-Alexandre Beaupré |
Abstract | Multiple object tracking (MOT) in urban traffic aims to produce the trajectories of the different road users that move across the field of view with different directions and speeds and that can have varying appearances and sizes. Occlusions and interactions among the different objects are expected and common due to the nature of urban road traffic. In this work, a tracking framework employing classification label information from a deep learning detection approach is used for associating the different objects, in addition to object position and appearances. We want to investigate the performance of a modern multiclass object detector for the MOT task in traffic scenes. Results show that the object labels improve tracking performance, but that the output of object detectors are not always reliable. |
Tasks | Multiple Object Tracking, Object Tracking |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02073v1 |
http://arxiv.org/pdf/1809.02073v1.pdf | |
PWC | https://paperswithcode.com/paper/multiple-object-tracking-in-urban-traffic |
Repo | |
Framework | |
Learning The Sequential Temporal Information with Recurrent Neural Networks
Title | Learning The Sequential Temporal Information with Recurrent Neural Networks |
Authors | Pushparaja Murugan |
Abstract | Recurrent Networks are one of the most powerful and promising artificial neural network algorithms to processing the sequential data such as natural languages, sound, time series data. Unlike traditional feed-forward network, Recurrent Network has a inherent feed back loop that allows to store the temporal context information and pass the state of information to the entire sequences of the events. This helps to achieve the state of art performance in many important tasks such as language modeling, stock market prediction, image captioning, speech recognition, machine translation and object tracking etc., However, training the fully connected RNN and managing the gradient flow are the complicated process. Many studies are carried out to address the mentioned limitation. This article is intent to provide the brief details about recurrent neurons, its variances and trips & tricks to train the fully recurrent neural network. This review work is carried out as a part of our IPO studio software module ‘Multiple Object Tracking’. |
Tasks | Image Captioning, Language Modelling, Machine Translation, Multiple Object Tracking, Object Tracking, Speech Recognition, Stock Market Prediction, Time Series |
Published | 2018-07-08 |
URL | http://arxiv.org/abs/1807.02857v1 |
http://arxiv.org/pdf/1807.02857v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-sequential-temporal-information |
Repo | |
Framework | |
Mitigating Sybils in Federated Learning Poisoning
Title | Mitigating Sybils in Federated Learning Poisoning |
Authors | Clement Fung, Chris J. M. Yoon, Ivan Beschastnikh |
Abstract | Machine learning (ML) over distributed multi-party data is required for a variety of domains. Existing approaches, such as federated learning, collect the outputs computed by a group of devices at a central aggregator and run iterative algorithms to train a globally shared model. Unfortunately, such approaches are susceptible to a variety of attacks, including model poisoning, which is made substantially worse in the presence of sybils. In this paper we first evaluate the vulnerability of federated learning to sybil-based poisoning attacks. We then describe \emph{FoolsGold}, a novel defense to this problem that identifies poisoning sybils based on the diversity of client updates in the distributed learning process. Unlike prior work, our system does not bound the expected number of attackers, requires no auxiliary information outside of the learning process, and makes fewer assumptions about clients and their data. In our evaluation we show that FoolsGold exceeds the capabilities of existing state of the art approaches to countering sybil-based label-flipping and backdoor poisoning attacks. Our results hold for different distributions of client data, varying poisoning targets, and various sybil strategies. Code can be found at: https://github.com/DistributedML/FoolsGold |
Tasks | |
Published | 2018-08-14 |
URL | https://arxiv.org/abs/1808.04866v4 |
https://arxiv.org/pdf/1808.04866v4.pdf | |
PWC | https://paperswithcode.com/paper/mitigating-sybils-in-federated-learning |
Repo | |
Framework | |