Paper Group ANR 16
Adversarial Music: Real World Audio Adversary Against Wake-word Detection System. LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling. A Self-Correcting Deep Learning Approach to Predict Acute Conditions in Critical Care. Negative eigenvalues of the Hessian in deep neural networks. Deep Reinforcement Learning for Clinical Decisi …
Adversarial Music: Real World Audio Adversary Against Wake-word Detection System
Title | Adversarial Music: Real World Audio Adversary Against Wake-word Detection System |
Authors | Juncheng B. Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze |
Abstract | Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-word detection to respond to people’s commands, which could potentially be vulnerable to audio adversarial examples. In this work, we target our attack on the wake-word detection system, jamming the model with some inconspicuous background music to deactivate the VAs while our audio adversary is present. We implemented an emulated wake-word detection system of Amazon Alexa based on recent publications. We validated our models against the real Alexa in terms of wake-word detection accuracy. Then we computed our audio adversaries with consideration of expectation over transform and we implemented our audio adversary with a differentiable synthesizer. Next, we verified our audio adversaries digitally on hundreds of samples of utterances collected from the real world. Our experiments show that we can effectively reduce the recognition F1 score of our emulated model from 93.4% to 11.0%. Finally, we tested our audio adversary over the air, and verified it works effectively against Alexa, reducing its F1 score from 92.5% to 11.0%.; We also verified that non-adversarial music does not disable Alexa as effectively as our music at the same sound level. To the best of our knowledge, this is the first real-world adversarial attack against a commercial-grade VA wake-word detection system. Our code and demo videos can be accessed at \url{https://www.junchengbillyli.com/AdversarialMusic} |
Tasks | Adversarial Attack |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.00126v3 |
https://arxiv.org/pdf/1911.00126v3.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-music-real-world-audio-adversary |
Repo | |
Framework | |
LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling
Title | LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling |
Authors | Yi-Ling Qiao, Lin Gao, Jie Yang, Paul L. Rosin, Yu-Kun Lai, Xilin Chen |
Abstract | 3D models are commonly used in computer vision and graphics. With the wider availability of mesh data, an efficient and intrinsic deep learning approach to processing 3D meshes is in great need. Unlike images, 3D meshes have irregular connectivity, requiring careful design to capture relations in the data. To utilize the topology information while staying robust under different triangulation, we propose to encode mesh connectivity using Laplacian spectral analysis, along with Mesh Pooling Blocks (MPBs) that can split the surface domain into local pooling patches and aggregate global information among them. We build a mesh hierarchy from fine to coarse using Laplacian spectral clustering, which is flexible under isometric transformation. Inside the MPBs there are pooling layers to collect local information and multi-layer perceptrons to compute vertex features with increasing complexity. To obtain the relationships among different clusters, we introduce a Correlation Net to compute a correlation matrix, which can aggregate the features globally by matrix multiplication with cluster features. Our network architecture is flexible enough to be used on meshes with different numbers of vertices. We conduct several experiments including shape segmentation and classification, and our LaplacianNet outperforms state-of-the-art algorithms for these tasks on ShapeNet and COSEG datasets. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.14063v1 |
https://arxiv.org/pdf/1910.14063v1.pdf | |
PWC | https://paperswithcode.com/paper/laplaciannet-learning-on-3d-meshes-with |
Repo | |
Framework | |
A Self-Correcting Deep Learning Approach to Predict Acute Conditions in Critical Care
Title | A Self-Correcting Deep Learning Approach to Predict Acute Conditions in Critical Care |
Authors | Ziyuan Pan, Hao Du, Kee Yuan Ngiam, Fei Wang, Ping Shum, Mengling Feng |
Abstract | In critical care, intensivists are required to continuously monitor high dimensional vital signs and lab measurements to detect and diagnose acute patient conditions. This has always been a challenging task. In this study, we propose a novel self-correcting deep learning prediction approach to address this challenge. We focus on an example of the prediction of acute kidney injury (AKI). Compared with the existing models, our method has a number of distinct features: we utilized the accumulative data of patients in ICU; we developed a self-correcting mechanism that feeds errors from the previous predictions back into the network; we also proposed a regularization method that takes into account not only the model’s prediction error on the label but also its estimation errors on the input data. This mechanism is applied in both regression and classification tasks. We compared the performance of our proposed method with the conventional deep learning models on two real-world clinical datasets and demonstrated that our proposed model constantly outperforms these baseline models. In particular, the proposed model achieved area under ROC curve at 0.893 on the MIMIC III dataset, and 0.871 on the Philips eICU dataset. |
Tasks | |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04364v1 |
http://arxiv.org/pdf/1901.04364v1.pdf | |
PWC | https://paperswithcode.com/paper/a-self-correcting-deep-learning-approach-to |
Repo | |
Framework | |
Negative eigenvalues of the Hessian in deep neural networks
Title | Negative eigenvalues of the Hessian in deep neural networks |
Authors | Guillaume Alain, Nicolas Le Roux, Pierre-Antoine Manzagol |
Abstract | The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research. In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately. |
Tasks | |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02366v1 |
http://arxiv.org/pdf/1902.02366v1.pdf | |
PWC | https://paperswithcode.com/paper/negative-eigenvalues-of-the-hessian-in-deep |
Repo | |
Framework | |
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
Title | Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey |
Authors | Siqi Liu, Kee Yuan Ngiam, Mengling Feng |
Abstract | Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications. |
Tasks | |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09475v1 |
https://arxiv.org/pdf/1907.09475v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-clinical |
Repo | |
Framework | |
Toward Explainable Fashion Recommendation
Title | Toward Explainable Fashion Recommendation |
Authors | Pongsate Tangseng, Takayuki Okatani |
Abstract | Many studies have been conducted so far to build systems for recommending fashion items and outfits. Although they achieve good performances in their respective tasks, most of them cannot explain their judgments to the users, which compromises their usefulness. Toward explainable fashion recommendation, this study proposes a system that is able not only to provide a goodness score for an outfit but also to explain the score by providing reason behind it. For this purpose, we propose a method for quantifying how influential each feature of each item is to the score. Using this influence value, we can identify which item and what feature make the outfit good or bad. We represent the image of each item with a combination of human-interpretable features, and thereby the identification of the most influential item-feature pair gives useful explanation of the output score. To evaluate the performance of this approach, we design an experiment that can be performed without human annotation; we replace a single item-feature pair in an outfit so that the score will decrease, and then we test if the proposed method can detect the replaced item correctly using the above influence values. The experimental results show that the proposed method can accurately detect bad items in outfits lowering their scores. |
Tasks | |
Published | 2019-01-15 |
URL | https://arxiv.org/abs/1901.04870v3 |
https://arxiv.org/pdf/1901.04870v3.pdf | |
PWC | https://paperswithcode.com/paper/toward-explainable-fashion-recommendation |
Repo | |
Framework | |
Predicting ice flow using machine learning
Title | Predicting ice flow using machine learning |
Authors | Yimeng Min, S. Karthik Mukkavilli, Yoshua Bengio |
Abstract | Though machine learning has achieved notable success in modeling sequential and spatial data for speech recognition and in computer vision, applications to remote sensing and climate science problems are seldom considered. In this paper, we demonstrate techniques from unsupervised learning of future video frame prediction, to increase the accuracy of ice flow tracking in multi-spectral satellite images. As the volume of cryosphere data increases in coming years, this is an interesting and important opportunity for machine learning to address a global challenge for climate change, risk management from floods, and conserving freshwater resources. Future frame prediction of ice melt and tracking the optical flow of ice dynamics presents modeling difficulties, due to uncertainties in global temperature increase, changing precipitation patterns, occlusion from cloud cover, rapid melting and glacier retreat due to black carbon aerosol deposition, from wildfires or human fossil emissions. We show the adversarial learning method helps improve the accuracy of tracking the optical flow of ice dynamics compared to existing methods in climate science. We present a dataset, IceNet, to encourage machine learning research and to help facilitate further applications in the areas of cryospheric science and climate change. |
Tasks | Optical Flow Estimation, Speech Recognition |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.08922v1 |
https://arxiv.org/pdf/1910.08922v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-ice-flow-using-machine-learning |
Repo | |
Framework | |
Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments
Title | Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments |
Authors | Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng |
Abstract | Recently unsupervised learning of depth from videos has made remarkable progress and the results are comparable to fully supervised methods in outdoor scenes like KITTI. However, there still exist great challenges when directly applying this technology in indoor environments, e.g., large areas of non-texture regions like white wall, more complex ego-motion of handheld camera, transparent glasses and shiny objects. To overcome these problems, we propose a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions. Our experimental evaluation demonstrates that the result of our method is comparable to fully supervised methods on the NYU Depth V2 benchmark. To the best of our knowledge, this is the first quantitative result of purely unsupervised learning method reported on indoor datasets. |
Tasks | Optical Flow Estimation |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.08898v1 |
https://arxiv.org/pdf/1910.08898v1.pdf | |
PWC | https://paperswithcode.com/paper/moving-indoor-unsupervised-video-depth |
Repo | |
Framework | |
Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks
Title | Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks |
Authors | Jasper S. Wijnands, Jason Thompson, Kerry A. Nice, Gideon D. P. A. Aschwanden, Mark Stevenson |
Abstract | Driver drowsiness increases crash risk, leading to substantial road trauma each year. Drowsiness detection methods have received considerable attention, but few studies have investigated the implementation of a detection approach on a mobile phone. Phone applications reduce the need for specialised hardware and hence, enable a cost-effective roll-out of the technology across the driving population. While it has been shown that three-dimensional (3D) operations are more suitable for spatiotemporal feature learning, current methods for drowsiness detection commonly use frame-based, multi-step approaches. However, computationally expensive techniques that achieve superior results on action recognition benchmarks (e.g. 3D convolutions, optical flow extraction) create bottlenecks for real-time, safety-critical applications on mobile devices. Here, we show how depthwise separable 3D convolutions, combined with an early fusion of spatial and temporal information, can achieve a balance between high prediction accuracy and real-time inference requirements. In particular, increased accuracy is achieved when assessment requires motion information, for example, when sunglasses conceal the eyes. Further, a custom TensorFlow-based smartphone application shows the true impact of various approaches on inference times and demonstrates the effectiveness of real-time monitoring based on out-of-sample data to alert a drowsy driver. Our model is pre-trained on ImageNet and Kinetics and fine-tuned on a publicly available Driver Drowsiness Detection dataset. Fine-tuning on large naturalistic driving datasets could further improve accuracy to obtain robust in-vehicle performance. Overall, our research is a step towards practical deep learning applications, potentially preventing micro-sleeps and reducing road trauma. |
Tasks | Optical Flow Estimation |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06540v1 |
https://arxiv.org/pdf/1910.06540v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-monitoring-of-driver-drowsiness-on |
Repo | |
Framework | |
OmniTrack: Real-time detection and tracking of objects, text and logos in video
Title | OmniTrack: Real-time detection and tracking of objects, text and logos in video |
Authors | Hannes Fassold, Ridouane Ghermi |
Abstract | The automatic detection and tracking of general objects (like persons, animals or cars), text and logos in a video is crucial for many video understanding tasks, and usually real-time processing as required. We propose OmniTrack, an efficient and robust algorithm which is able to automatically detect and track objects, text as well as brand logos in real-time. It combines a powerful deep learning based object detector (YoloV3) with high-quality optical flow methods. Based on the reference YoloV3 C++ implementation, we did some important performance optimizations which will be described. The major steps in the training procedure for the combined detector for text and logo will be presented. We will describe then the OmniTrack algorithm, consisting of the phases preprocessing, feature calculation, prediction, matching and update. Several performance optimizations have been implemented there as well, like doing the object detection and optical flow calculation asynchronously. Experiments show that the proposed algorithm runs in real-time for standard definition ($720x576$) video on a PC with a Quadro RTX 5000 GPU. |
Tasks | Object Detection, Optical Flow Estimation, Video Understanding |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06017v1 |
https://arxiv.org/pdf/1910.06017v1.pdf | |
PWC | https://paperswithcode.com/paper/omnitrack-real-time-detection-and-tracking-of |
Repo | |
Framework | |
Fiducia: A Personalized Food Recommender System for Zomato
Title | Fiducia: A Personalized Food Recommender System for Zomato |
Authors | Mansi Goel, Ayush Agarwal, Deepak Thukral, Tanmoy Chakraborty |
Abstract | This paper presents Fiducia, a food review system involving a pipeline which processes restaurant-related reviews obtained from Zomato (India’s largest restaurant search and discovery service). Fiducia is specific to popular cafe food items and manages to identify relevant information pertaining to each item separately in the reviews. It uses a sentiment check on these pieces of text and accordingly suggests an appropriate restaurant for the particular item depending on user-item and item-item similarity. Experimental results show that the sentiment analyzer module of Fiducia achieves an accuracy of over 85% and our final recommender system achieves an RMSE of about 1.01 beating other baselines. |
Tasks | Recommendation Systems |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10117v1 |
http://arxiv.org/pdf/1903.10117v1.pdf | |
PWC | https://paperswithcode.com/paper/fiducia-a-personalized-food-recommender |
Repo | |
Framework | |
BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
Title | BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge |
Authors | Jeff Da |
Abstract | We introduce a simple yet effective method of integrating contextual embeddings with commonsense graph embeddings, dubbed BERT Infused Graphs: Matching Over Other embeDdings. First, we introduce a preprocessing method to improve the speed of querying knowledge bases. Then, we develop a method of creating knowledge embeddings from each knowledge base. We introduce a method of aligning tokens between two misaligned tokenization methods. Finally, we contribute a method of contextualizing BERT after combining with knowledge base embeddings. We also show BERTs tendency to correct lower accuracy question types. Our model achieves a higher accuracy than BERT, and we score fifth on the official leaderboard of the shared task and score the highest without any additional language model pretraining. |
Tasks | Language Modelling, Tokenization |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.07713v1 |
https://arxiv.org/pdf/1910.07713v1.pdf | |
PWC | https://paperswithcode.com/paper/big-mood-relating-transformers-to-explicit |
Repo | |
Framework | |
GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier
Title | GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier |
Authors | Alexandre Gariépy, Jean-Christophe Ruel, Brahim Chaib-draa, Philippe Giguère |
Abstract | Grasping is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externally-trained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in grasping. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQ-STN is able to detect robust grasps on the depth images of the Dex-Net 2.0 dataset with 92.4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous sampling-based methods, while being more than 60 times faster. |
Tasks | |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02489v2 |
https://arxiv.org/pdf/1903.02489v2.pdf | |
PWC | https://paperswithcode.com/paper/gq-stn-optimizing-one-shot-grasp-detection |
Repo | |
Framework | |
On the Global Convergence of (Fast) Incremental Expectation Maximization Methods
Title | On the Global Convergence of (Fast) Incremental Expectation Maximization Methods |
Authors | Belhal Karimi, Hoi-To Wai, Eric Moulines, Marc Lavielle |
Abstract | The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the algorithm. To alleviate this problem, Neal and Hinton have proposed an incremental version of the EM (iEM) in which at each iteration the conditional expectation of the latent data (E-step) is updated only for a mini-batch of observations. Another approach has been proposed by Capp'e and Moulines in which the E-step is replaced by a stochastic approximation step, closely related to stochastic gradient. In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of Chen et. al. in a common unifying framework. We also introduce a new version incremental version, inspired by the SAGA algorithm by Defazio et. al. We establish non-asymptotic convergence bounds for global convergence. Numerical applications are presented in this article to illustrate our findings. |
Tasks | |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12521v1 |
https://arxiv.org/pdf/1910.12521v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-global-convergence-of-fast-incremental |
Repo | |
Framework | |
Exploring Unlabeled Faces for Novel Attribute Discovery
Title | Exploring Unlabeled Faces for Novel Attribute Discovery |
Authors | Hyojin Bahng, Sunghyo Chung, Seungjoo Yoo, Jaegul Choo |
Abstract | Despite remarkable success in unpaired image-to-image translation, existing systems still require a large amount of labeled images. This is a bottleneck for their real-world applications; in practice, a model trained on labeled CelebA dataset does not work well for test images from a different distribution – greatly limiting their application to unlabeled images of a much larger quantity. In this paper, we attempt to alleviate this necessity for labeled data in the facial image translation domain. We aim to explore the degree to which you can discover novel attributes from unlabeled faces and perform high-quality translation. To this end, we use prior knowledge about the visual world as guidance to discover novel attributes and transfer them via a novel normalization method. Experiments show that our method trained on unlabeled data produces high-quality translations, preserves identity, and be perceptually realistic as good as, or better than, state-of-the-art methods trained on labeled data. |
Tasks | Image-to-Image Translation |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03085v1 |
https://arxiv.org/pdf/1912.03085v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-unlabeled-faces-for-novel-attribute |
Repo | |
Framework | |