Paper Group ANR 318
Detection of Paroxysmal Atrial Fibrillation using Attention-based Bidirectional Recurrent Neural Networks. Foresee: Attentive Future Projections of Chaotic Road Environments with Online Training. Connections between physics, mathematics and deep learning. Transfer Learning Using Classification Layer Features of CNN. AUNet: Attention-guided dense-up …
Detection of Paroxysmal Atrial Fibrillation using Attention-based Bidirectional Recurrent Neural Networks
Title | Detection of Paroxysmal Atrial Fibrillation using Attention-based Bidirectional Recurrent Neural Networks |
Authors | Supreeth P. Shashikumar, Amit J. Shah, Gari D. Clifford, Shamim Nemati |
Abstract | Detection of atrial fibrillation (AF), a type of cardiac arrhythmia, is difficult since many cases of AF are usually clinically silent and undiagnosed. In particular paroxysmal AF is a form of AF that occurs occasionally, and has a higher probability of being undetected. In this work, we present an attention based deep learning framework for detection of paroxysmal AF episodes from a sequence of windows. Time-frequency representation of 30 seconds recording windows, over a 10 minute data segment, are fed sequentially into a deep convolutional neural network for image-based feature extraction, which are then presented to a bidirectional recurrent neural network with an attention layer for AF detection. To demonstrate the effectiveness of the proposed framework for transient AF detection, we use a database of 24 hour Holter Electrocardiogram (ECG) recordings acquired from 2850 patients at the University of Virginia heart station. The algorithm achieves an AUC of 0.94 on the testing set, which exceeds the performance of baseline models. We also demonstrate the cross-domain generalizablity of the approach by adapting the learned model parameters from one recording modality (ECG) to another (photoplethysmogram) with improved AF detection performance. The proposed high accuracy, low false alarm algorithm for detecting paroxysmal AF has potential applications in long-term monitoring using wearable sensors. |
Tasks | Atrial Fibrillation Detection, Electrocardiography (ECG) |
Published | 2018-05-07 |
URL | http://arxiv.org/abs/1805.09133v1 |
http://arxiv.org/pdf/1805.09133v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-paroxysmal-atrial-fibrillation |
Repo | |
Framework | |
Foresee: Attentive Future Projections of Chaotic Road Environments with Online Training
Title | Foresee: Attentive Future Projections of Chaotic Road Environments with Online Training |
Authors | Anil Sharma, Prabhat Kumar |
Abstract | In this paper, we train a recurrent neural network to learn dynamics of a chaotic road environment and to project the future of the environment on an image. Future projection can be used to anticipate an unseen environment for example, in autonomous driving. Road environment is highly dynamic and complex due to the interaction among traffic participants such as vehicles and pedestrians. Even in this complex environment, a human driver is efficacious to safely drive on chaotic roads irrespective of the number of traffic participants. The proliferation of deep learning research has shown the efficacy of neural networks in learning this human behavior. In the same direction, we investigate recurrent neural networks to understand the chaotic road environment which is shared by pedestrians, vehicles (cars, trucks, bicycles etc.), and sometimes animals as well. We propose \emph{Foresee}, a unidirectional gated recurrent units (GRUs) network with attention to project future of the environment in the form of images. We have collected several videos on Delhi roads consisting of various traffic participants, background and infrastructure differences (like 3D pedestrian crossing) at various times on various days. We train \emph{Foresee} in an unsupervised way and we use online training to project frames up to $0.5$ seconds in advance. We show that our proposed model performs better than state of the art methods (prednet and Enc. Dec. LSTM) and finally, we show that our trained model generalizes to a public dataset for future projections. |
Tasks | Autonomous Driving |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11861v1 |
http://arxiv.org/pdf/1805.11861v1.pdf | |
PWC | https://paperswithcode.com/paper/foresee-attentive-future-projections-of |
Repo | |
Framework | |
Connections between physics, mathematics and deep learning
Title | Connections between physics, mathematics and deep learning |
Authors | Jean Thierry-Mieg |
Abstract | Starting from the Fermat’s principle of least action, which governs classical and quantum mechanics and from the theory of exterior differential forms, which governs the geometry of curved manifolds, we show how to derive the equations governing neural networks in an intrinsic, coordinate invariant way, where the loss function plays the role of the Hamiltonian. To be covariant, these equations imply a layer metric which is instrumental in pretraining and explains the role of conjugation when using complex numbers. The differential formalism also clarifies the relation of the gradient descent optimizer with Aristotelian and Newtonian mechanics and why large learning steps break the logic of the linearization procedure. We hope that this formal presentation of the differential geometry of neural networks will encourage some physicists to dive into deep learning, and reciprocally, that the specialists of deep learning will better appreciate the close interconnection of their subject with the foundations of classical and quantum field theory. |
Tasks | |
Published | 2018-11-01 |
URL | https://arxiv.org/abs/1811.00576v3 |
https://arxiv.org/pdf/1811.00576v3.pdf | |
PWC | https://paperswithcode.com/paper/how-the-fundamental-concepts-of-mathematics |
Repo | |
Framework | |
Transfer Learning Using Classification Layer Features of CNN
Title | Transfer Learning Using Classification Layer Features of CNN |
Authors | Tasfia Shermin, Manzur Murshed, Guojun Lu, Shyh Wei Teng |
Abstract | Although CNNs have gained the ability to transfer learned knowledge from source task to target task by virtue of large annotated datasets but consume huge processing time to fine-tune without GPU. In this paper, we propose a new computationally efficient transfer learning approach using classification layer features of pre-trained CNNs by appending layer after existing classification layer. We demonstrate that fine-tuning of the appended layer with existing classification layer for new task converges much faster than baseline and in average outperforms baseline classification accuracy. Furthermore, we execute thorough experiments to examine the influence of quantity, similarity, and dissimilarity of training sets in our classification outcomes to demonstrate transferability of classification layer features. |
Tasks | Transfer Learning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07459v2 |
http://arxiv.org/pdf/1811.07459v2.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-transfer-learning-technique-by |
Repo | |
Framework | |
AUNet: Attention-guided dense-upsampling networks for breast mass segmentation in whole mammograms
Title | AUNet: Attention-guided dense-upsampling networks for breast mass segmentation in whole mammograms |
Authors | Hui Sun, Cheng Li, Boqiang Liu, Hairong Zheng, David Dagan Feng, Shanshan Wang |
Abstract | Mammography is one of the most commonly applied tools for early breast cancer screening. Automatic segmentation of breast masses in mammograms is essential but challenging due to the low signal-to-noise ratio and the wide variety of mass shapes and sizes. Existing methods deal with these challenges mainly by extracting mass-centered image patches manually or automatically. However, manual patch extraction is time-consuming and automatic patch extraction brings errors that could not be compensated in the following segmentation step. In this study, we propose a novel attention-guided dense-upsampling network (AUNet) for accurate breast mass segmentation in whole mammograms directly. In AUNet, we employ an asymmetrical encoder-decoder structure and propose an effective upsampling block, attention-guided dense-upsampling block (AU block). Especially, the AU block is designed to have three merits. Firstly, it compensates the information loss of bilinear upsampling by dense upsampling. Secondly, it designs a more effective method to fuse high- and low-level features. Thirdly, it includes a channel-attention function to highlight rich-information channels. We evaluated the proposed method on two publicly available datasets, CBIS-DDSM and INbreast. Compared to three state-of-the-art fully convolutional networks, AUNet achieved the best performances with an average Dice similarity coefficient of 81.8% for CBIS-DDSM and 79.1% for INbreast. |
Tasks | Breast Mass Segmentation In Whole Mammograms |
Published | 2018-10-24 |
URL | https://arxiv.org/abs/1810.10151v3 |
https://arxiv.org/pdf/1810.10151v3.pdf | |
PWC | https://paperswithcode.com/paper/aunet-attention-guided-dense-upsampling |
Repo | |
Framework | |
Translation of Algorithmic Descriptions of Discrete Functions to SAT with Applications to Cryptanalysis Problems
Title | Translation of Algorithmic Descriptions of Discrete Functions to SAT with Applications to Cryptanalysis Problems |
Authors | Alexander Semenov, Ilya Otpuschennikov, Irina Gribanova, Oleg Zaikin, Stepan Kochemazov |
Abstract | In the present paper, we propose a technology for translating algorithmic descriptions of discrete functions to SAT. The proposed technology is aimed at applications in algebraic cryptanalysis. We describe how cryptanalysis problems are reduced to SAT in such a way that it should be perceived as natural by the cryptographic community. In~the theoretical part of the paper we justify the main principles of general reduction to SAT for discrete functions from a class containing the majority of functions employed in cryptography. Then, we describe the Transalg software tool developed based on these principles with SAT-based cryptanalysis specifics in mind. We demonstrate the results of applications of Transalg to construction of a number of attacks on various cryptographic functions. Some of the corresponding attacks are state of the art. We compare the functional capabilities of the proposed tool with that of other domain-specific software tools which can be used to reduce cryptanalysis problems to SAT, and also with the CBMC system widely employed in symbolic verification. The paper also presents vast experimental data, obtained using the SAT solvers that took first places at the SAT competitions in the recent several years. |
Tasks | Cryptanalysis |
Published | 2018-05-17 |
URL | https://arxiv.org/abs/1805.07239v5 |
https://arxiv.org/pdf/1805.07239v5.pdf | |
PWC | https://paperswithcode.com/paper/translation-of-algorithmic-descriptions-of |
Repo | |
Framework | |
Breaking Transferability of Adversarial Samples with Randomness
Title | Breaking Transferability of Adversarial Samples with Randomness |
Authors | Yan Zhou, Murat Kantarcioglu, Bowei Xi |
Abstract | We investigate the role of transferability of adversarial attacks in the observed vulnerabilities of Deep Neural Networks (DNNs). We demonstrate that introducing randomness to the DNN models is sufficient to defeat adversarial attacks, given that the adversary does not have an unlimited attack budget. Instead of making one specific DNN model robust to perfect knowledge attacks (a.k.a, white box attacks), creating randomness within an army of DNNs completely eliminates the possibility of perfect knowledge acquisition, resulting in a significantly more robust DNN ensemble against the strongest form of attacks. We also show that when the adversary has an unlimited budget of data perturbation, all defensive techniques would eventually break down as the budget increases. Therefore, it is important to understand the game saddle point where the adversary would not further pursue this endeavor. Furthermore, we explore the relationship between attack severity and decision boundary robustness in the version space. We empirically demonstrate that by simply adding a small Gaussian random noise to the learned weights, a DNN model can increase its resilience to adversarial attacks by as much as 74.2%. More importantly, we show that by randomly activating/revealing a model from a pool of pre-trained DNNs at each query request, we can put a tremendous strain on the adversary’s attack strategies. We compare our randomization techniques to the Ensemble Adversarial Training technique and show that our randomization techniques are superior under different attack budget constraints. |
Tasks | |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04613v2 |
http://arxiv.org/pdf/1805.04613v2.pdf | |
PWC | https://paperswithcode.com/paper/breaking-transferability-of-adversarial |
Repo | |
Framework | |
ALE: Additive Latent Effect Models for Grade Prediction
Title | ALE: Additive Latent Effect Models for Grade Prediction |
Authors | Zhiyun Ren, Xia Ning, Huzefa Rangwala |
Abstract | The past decade has seen a growth in the development and deployment of educational technologies for assisting college-going students in choosing majors, selecting courses and acquiring feedback based on past academic performance. Grade prediction methods seek to estimate a grade that a student may achieve in a course that she may take in the future (e.g., next term). Accurate and timely prediction of students’ academic grades is important for developing effective degree planners and early warning systems, and ultimately improving educational outcomes. Existing grade pre- diction methods mostly focus on modeling the knowledge components associated with each course and student, and often overlook other factors such as the difficulty of each knowledge component, course instructors, student interest, capabilities and effort. In this paper, we propose additive latent effect models that incorporate these factors to predict the student next-term grades. Specifically, the proposed models take into account four factors: (i) student’s academic level, (ii) course instructors, (iii) student global latent factor, and (iv) latent knowledge factors. We compared the new models with several state-of-the-art methods on students of various characteristics (e.g., whether a student transferred in or not). The experimental results demonstrate that the proposed methods significantly outperform the baselines on grade prediction problem. Moreover, we perform a thorough analysis on the importance of different factors and how these factors can practically assist students in course selection, and finally improve their academic performance. |
Tasks | |
Published | 2018-01-17 |
URL | http://arxiv.org/abs/1801.05535v1 |
http://arxiv.org/pdf/1801.05535v1.pdf | |
PWC | https://paperswithcode.com/paper/ale-additive-latent-effect-models-for-grade |
Repo | |
Framework | |
Distributed Learning over Unreliable Networks
Title | Distributed Learning over Unreliable Networks |
Authors | Chen Yu, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu |
Abstract | Most of today’s distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, we connect these two trends, and consider the following question: {\em Can we design machine learning systems that are tolerant to network unreliability during training?} With this motivation, we focus on a theoretical problem of independent interest—given a standard distributed parameter server architecture, if every communication between the worker and the server has a non-zero probability $p$ of being dropped, does there exist an algorithm that still converges, and at what speed? The technical contribution of this paper is a novel theoretical analysis proving that distributed learning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks. Further, we prove that the influence of the packet drop rate diminishes with the growth of the number of \textcolor{black}{parameter servers}. We map this theoretical result onto a real-world scenario, training deep neural networks over an unreliable network layer, and conduct network simulation to validate the system improvement by allowing the networks to be unreliable. |
Tasks | |
Published | 2018-10-17 |
URL | https://arxiv.org/abs/1810.07766v4 |
https://arxiv.org/pdf/1810.07766v4.pdf | |
PWC | https://paperswithcode.com/paper/distributed-learning-over-unreliable-networks |
Repo | |
Framework | |
Residual Reinforcement Learning for Robot Control
Title | Residual Reinforcement Learning for Robot Control |
Authors | Tobias Johannink, Shikhar Bahl, Ashvin Nair, Jianlan Luo, Avinash Kumar, Matthias Loskyll, Juan Aparicio Ojea, Eugen Solowjow, Sergey Levine |
Abstract | Conventional feedback control methods can solve various types of robot control problems very efficiently by capturing the structure with explicit models, such as rigid body equations of motion. However, many control problems in modern manufacturing deal with contacts and friction, which are difficult to capture with first-order physical modeling. Hence, applying control design methodologies to these kinds of problems often results in brittle and inaccurate controllers, which have to be manually tuned for deployment. Reinforcement learning (RL) methods have been demonstrated to be capable of learning continuous robot controllers from interactions with the environment, even for problems that include friction and contacts. In this paper, we study how we can solve difficult control problems in the real world by decomposing them into a part that is solved efficiently by conventional feedback control methods, and the residual which is solved with RL. The final control policy is a superposition of both control signals. We demonstrate our approach by training an agent to successfully perform a real-world block assembly task involving contacts and unstable objects. |
Tasks | |
Published | 2018-12-07 |
URL | http://arxiv.org/abs/1812.03201v2 |
http://arxiv.org/pdf/1812.03201v2.pdf | |
PWC | https://paperswithcode.com/paper/residual-reinforcement-learning-for-robot |
Repo | |
Framework | |
Hardware based Spatio-Temporal Neural Processing Backend for Imaging Sensors: Towards a Smart Camera
Title | Hardware based Spatio-Temporal Neural Processing Backend for Imaging Sensors: Towards a Smart Camera |
Authors | Samiran Ganguly, Yunfei Gu, Mircea R. Stan, Avik W. Ghosh |
Abstract | In this work we show how we can build a technology platform for cognitive imaging sensors using recent advances in recurrent neural network architectures and training methods inspired from biology. We demonstrate learning and processing tasks specific to imaging sensors, including enhancement of sensitivity and signal-to-noise ratio (SNR) purely through neural filtering beyond the fundamental limits sensor materials, and inferencing and spatio-temporal pattern recognition capabilities of these networks with applications in object detection, motion tracking and prediction. We then show designs of unit hardware cells built using complementary metal-oxide semiconductor (CMOS) and emerging materials technologies for ultra-compact and energy-efficient embedded neural processors for smart cameras. |
Tasks | Object Detection |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08635v1 |
http://arxiv.org/pdf/1803.08635v1.pdf | |
PWC | https://paperswithcode.com/paper/hardware-based-spatio-temporal-neural |
Repo | |
Framework | |
Residual Memory Networks: Feed-forward approach to learn long temporal dependencies
Title | Residual Memory Networks: Feed-forward approach to learn long temporal dependencies |
Authors | Murali Karthick Baskar, Martin Karafiat, Lukas Burget, Karel Vesely, Frantisek Grezl, Jan Honza Cernocky |
Abstract | Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks. RMN is further extended as bi-directional RMN (BRMN) to capture both past and future information. Experimental analysis is done on AMI corpus to substantiate the capability of RMN in learning long-term information and hierarchical information. Recognition performance of RMN trained with 300 hours of Switchboard corpus is compared with various state-of-the-art LVCSR systems. The results indicate that RMN and BRMN gains 6 % and 3.8 % relative improvement over LSTM and BLSTM networks. |
Tasks | Large Vocabulary Continuous Speech Recognition |
Published | 2018-08-06 |
URL | http://arxiv.org/abs/1808.01916v1 |
http://arxiv.org/pdf/1808.01916v1.pdf | |
PWC | https://paperswithcode.com/paper/residual-memory-networks-feed-forward |
Repo | |
Framework | |
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks
Title | Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks |
Authors | Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subramaniyan, Ravi Iyer, Dennis Sylvester, David Blaauw, Reetuparna Das |
Abstract | This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 18.3x over state-of-art multi-core CPU (Xeon E5), 7.7x over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4x over CPU (2.2x over GPU), while reducing power consumption by 50% over CPU (53% over GPU). |
Tasks | Quantization |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03718v1 |
http://arxiv.org/pdf/1805.03718v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-cache-bit-serial-in-cache-acceleration |
Repo | |
Framework | |
Feature Space Transfer for Data Augmentation
Title | Feature Space Transfer for Data Augmentation |
Authors | Bo Liu, Xudong Wang, Mandar Dixit, Roland Kwitt, Nuno Vasconcelos |
Abstract | The problem of data augmentation in feature space is considered. A new architecture, denoted the FeATure TransfEr Network (FATTEN), is proposed for the modeling of feature trajectories induced by variations of object pose. This architecture exploits a parametrization of the pose manifold in terms of pose and appearance. This leads to a deep encoder/decoder network architecture, where the encoder factors into an appearance and a pose predictor. Unlike previous attempts at trajectory transfer, FATTEN can be efficiently trained end-to-end, with no need to train separate feature transfer functions. This is realized by supplying the decoder with information about a target pose and the use of a multi-task loss that penalizes category- and pose-mismatches. In result, FATTEN discourages discontinuous or non-smooth trajectories that fail to capture the structure of the pose manifold, and generalizes well on object recognition tasks involving large pose variation. Experimental results on the artificial ModelNet database show that it can successfully learn to map source features to target features of a desired pose, while preserving class identity. Most notably, by using feature space transfer for data augmentation (w.r.t. pose and depth) on SUN-RGBD objects, we demonstrate considerable performance improvements on one/few-shot object recognition in a transfer learning setup, compared to current state-of-the-art methods. |
Tasks | Data Augmentation, Object Recognition, Transfer Learning |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.04356v3 |
http://arxiv.org/pdf/1801.04356v3.pdf | |
PWC | https://paperswithcode.com/paper/feature-space-transfer-for-data-augmentation |
Repo | |
Framework | |
Video Time: Properties, Encoders and Evaluation
Title | Video Time: Properties, Encoders and Evaluation |
Authors | Amir Ghodrati, Efstratios Gavves, Cees G. M. Snoek |
Abstract | Time-aware encoding of frame sequences in a video is a fundamental problem in video understanding. While many attempted to model time in videos, an explicit study on quantifying video time is missing. To fill this lacuna, we aim to evaluate video time explicitly. We describe three properties of video time, namely a) temporal asymmetry, b)temporal continuity and c) temporal causality. Based on each we formulate a task able to quantify the associated property. This allows assessing the effectiveness of modern video encoders, like C3D and LSTM, in their ability to model time. Our analysis provides insights about existing encoders while also leading us to propose a new video time encoder, which is better suited for the video time recognition tasks than C3D and LSTM. We believe the proposed meta-analysis can provide a reasonable baseline to assess video time encoders on equal grounds on a set of temporal-aware tasks. |
Tasks | Video Understanding |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.06980v1 |
http://arxiv.org/pdf/1807.06980v1.pdf | |
PWC | https://paperswithcode.com/paper/video-time-properties-encoders-and-evaluation |
Repo | |
Framework | |