January 31, 2020

3172 words 15 mins read

Paper Group ANR 16

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System. LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling. A Self-Correcting Deep Learning Approach to Predict Acute Conditions in Critical Care. Negative eigenvalues of the Hessian in deep neural networks. Deep Reinforcement Learning for Clinical Decisi …

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System


Title	Adversarial Music: Real World Audio Adversary Against Wake-word Detection System
Authors	Juncheng B. Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze
Abstract	Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-word detection to respond to people’s commands, which could potentially be vulnerable to audio adversarial examples. In this work, we target our attack on the wake-word detection system, jamming the model with some inconspicuous background music to deactivate the VAs while our audio adversary is present. We implemented an emulated wake-word detection system of Amazon Alexa based on recent publications. We validated our models against the real Alexa in terms of wake-word detection accuracy. Then we computed our audio adversaries with consideration of expectation over transform and we implemented our audio adversary with a differentiable synthesizer. Next, we verified our audio adversaries digitally on hundreds of samples of utterances collected from the real world. Our experiments show that we can effectively reduce the recognition F1 score of our emulated model from 93.4% to 11.0%. Finally, we tested our audio adversary over the air, and verified it works effectively against Alexa, reducing its F1 score from 92.5% to 11.0%.; We also verified that non-adversarial music does not disable Alexa as effectively as our music at the same sound level. To the best of our knowledge, this is the first real-world adversarial attack against a commercial-grade VA wake-word detection system. Our code and demo videos can be accessed at \url{https://www.junchengbillyli.com/AdversarialMusic}
Tasks	Adversarial Attack
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00126v3
PDF	https://arxiv.org/pdf/1911.00126v3.pdf
PWC	https://paperswithcode.com/paper/adversarial-music-real-world-audio-adversary
Repo
Framework

LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling


Title	LaplacianNet: Learning on 3D Meshes with Laplacian Encoding and Pooling
Authors	Yi-Ling Qiao, Lin Gao, Jie Yang, Paul L. Rosin, Yu-Kun Lai, Xilin Chen
Abstract	3D models are commonly used in computer vision and graphics. With the wider availability of mesh data, an efficient and intrinsic deep learning approach to processing 3D meshes is in great need. Unlike images, 3D meshes have irregular connectivity, requiring careful design to capture relations in the data. To utilize the topology information while staying robust under different triangulation, we propose to encode mesh connectivity using Laplacian spectral analysis, along with Mesh Pooling Blocks (MPBs) that can split the surface domain into local pooling patches and aggregate global information among them. We build a mesh hierarchy from fine to coarse using Laplacian spectral clustering, which is flexible under isometric transformation. Inside the MPBs there are pooling layers to collect local information and multi-layer perceptrons to compute vertex features with increasing complexity. To obtain the relationships among different clusters, we introduce a Correlation Net to compute a correlation matrix, which can aggregate the features globally by matrix multiplication with cluster features. Our network architecture is flexible enough to be used on meshes with different numbers of vertices. We conduct several experiments including shape segmentation and classification, and our LaplacianNet outperforms state-of-the-art algorithms for these tasks on ShapeNet and COSEG datasets.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14063v1
PDF	https://arxiv.org/pdf/1910.14063v1.pdf
PWC	https://paperswithcode.com/paper/laplaciannet-learning-on-3d-meshes-with
Repo
Framework

A Self-Correcting Deep Learning Approach to Predict Acute Conditions in Critical Care


Title	A Self-Correcting Deep Learning Approach to Predict Acute Conditions in Critical Care
Authors	Ziyuan Pan, Hao Du, Kee Yuan Ngiam, Fei Wang, Ping Shum, Mengling Feng
Abstract	In critical care, intensivists are required to continuously monitor high dimensional vital signs and lab measurements to detect and diagnose acute patient conditions. This has always been a challenging task. In this study, we propose a novel self-correcting deep learning prediction approach to address this challenge. We focus on an example of the prediction of acute kidney injury (AKI). Compared with the existing models, our method has a number of distinct features: we utilized the accumulative data of patients in ICU; we developed a self-correcting mechanism that feeds errors from the previous predictions back into the network; we also proposed a regularization method that takes into account not only the model’s prediction error on the label but also its estimation errors on the input data. This mechanism is applied in both regression and classification tasks. We compared the performance of our proposed method with the conventional deep learning models on two real-world clinical datasets and demonstrated that our proposed model constantly outperforms these baseline models. In particular, the proposed model achieved area under ROC curve at 0.893 on the MIMIC III dataset, and 0.871 on the Philips eICU dataset.
Tasks
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04364v1
PDF	http://arxiv.org/pdf/1901.04364v1.pdf
PWC	https://paperswithcode.com/paper/a-self-correcting-deep-learning-approach-to
Repo
Framework

Negative eigenvalues of the Hessian in deep neural networks


Title	Negative eigenvalues of the Hessian in deep neural networks
Authors	Guillaume Alain, Nicolas Le Roux, Pierre-Antoine Manzagol
Abstract	The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research. In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.
Tasks
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02366v1
PDF	http://arxiv.org/pdf/1902.02366v1.pdf
PWC	https://paperswithcode.com/paper/negative-eigenvalues-of-the-hessian-in-deep
Repo
Framework

Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey


Title	Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
Authors	Siqi Liu, Kee Yuan Ngiam, Mengling Feng
Abstract	Owe to the recent advancements in Artificial Intelligence especially deep learning, many data-driven decision support systems have been implemented to facilitate medical doctors in delivering personalized care. We focus on the deep reinforcement learning (DRL) models in this paper. DRL models have demonstrated human-level or even superior performance in the tasks of computer vision and game playings, such as Go and Atari game. However, the adoption of deep reinforcement learning techniques in clinical decision optimization is still rare. We present the first survey that summarizes reinforcement learning algorithms with Deep Neural Networks (DNN) on clinical decision support. We also discuss some case studies, where different DRL algorithms were applied to address various clinical challenges. We further compare and contrast the advantages and limitations of various DRL algorithms and present a preliminary guide on how to choose the appropriate DRL algorithm for particular clinical applications.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09475v1
PDF	https://arxiv.org/pdf/1907.09475v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-clinical
Repo
Framework

Toward Explainable Fashion Recommendation


Title	Toward Explainable Fashion Recommendation
Authors	Pongsate Tangseng, Takayuki Okatani
Abstract	Many studies have been conducted so far to build systems for recommending fashion items and outfits. Although they achieve good performances in their respective tasks, most of them cannot explain their judgments to the users, which compromises their usefulness. Toward explainable fashion recommendation, this study proposes a system that is able not only to provide a goodness score for an outfit but also to explain the score by providing reason behind it. For this purpose, we propose a method for quantifying how influential each feature of each item is to the score. Using this influence value, we can identify which item and what feature make the outfit good or bad. We represent the image of each item with a combination of human-interpretable features, and thereby the identification of the most influential item-feature pair gives useful explanation of the output score. To evaluate the performance of this approach, we design an experiment that can be performed without human annotation; we replace a single item-feature pair in an outfit so that the score will decrease, and then we test if the proposed method can detect the replaced item correctly using the above influence values. The experimental results show that the proposed method can accurately detect bad items in outfits lowering their scores.
Tasks
Published	2019-01-15
URL	https://arxiv.org/abs/1901.04870v3
PDF	https://arxiv.org/pdf/1901.04870v3.pdf
PWC	https://paperswithcode.com/paper/toward-explainable-fashion-recommendation
Repo
Framework

Predicting ice flow using machine learning


Title	Predicting ice flow using machine learning
Authors	Yimeng Min, S. Karthik Mukkavilli, Yoshua Bengio
Abstract	Though machine learning has achieved notable success in modeling sequential and spatial data for speech recognition and in computer vision, applications to remote sensing and climate science problems are seldom considered. In this paper, we demonstrate techniques from unsupervised learning of future video frame prediction, to increase the accuracy of ice flow tracking in multi-spectral satellite images. As the volume of cryosphere data increases in coming years, this is an interesting and important opportunity for machine learning to address a global challenge for climate change, risk management from floods, and conserving freshwater resources. Future frame prediction of ice melt and tracking the optical flow of ice dynamics presents modeling difficulties, due to uncertainties in global temperature increase, changing precipitation patterns, occlusion from cloud cover, rapid melting and glacier retreat due to black carbon aerosol deposition, from wildfires or human fossil emissions. We show the adversarial learning method helps improve the accuracy of tracking the optical flow of ice dynamics compared to existing methods in climate science. We present a dataset, IceNet, to encourage machine learning research and to help facilitate further applications in the areas of cryospheric science and climate change.
Tasks	Optical Flow Estimation, Speech Recognition
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08922v1
PDF	https://arxiv.org/pdf/1910.08922v1.pdf
PWC	https://paperswithcode.com/paper/predicting-ice-flow-using-machine-learning
Repo
Framework

Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments


Title	Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments
Authors	Junsheng Zhou, Yuwang Wang, Kaihuai Qin, Wenjun Zeng
Abstract	Recently unsupervised learning of depth from videos has made remarkable progress and the results are comparable to fully supervised methods in outdoor scenes like KITTI. However, there still exist great challenges when directly applying this technology in indoor environments, e.g., large areas of non-texture regions like white wall, more complex ego-motion of handheld camera, transparent glasses and shiny objects. To overcome these problems, we propose a new optical-flow based training paradigm which reduces the difficulty of unsupervised learning by providing a clearer training target and handles the non-texture regions. Our experimental evaluation demonstrates that the result of our method is comparable to fully supervised methods on the NYU Depth V2 benchmark. To the best of our knowledge, this is the first quantitative result of purely unsupervised learning method reported on indoor datasets.
Tasks	Optical Flow Estimation
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08898v1
PDF	https://arxiv.org/pdf/1910.08898v1.pdf
PWC	https://paperswithcode.com/paper/moving-indoor-unsupervised-video-depth
Repo
Framework

Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks


Title	Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks
Authors	Jasper S. Wijnands, Jason Thompson, Kerry A. Nice, Gideon D. P. A. Aschwanden, Mark Stevenson
Abstract	Driver drowsiness increases crash risk, leading to substantial road trauma each year. Drowsiness detection methods have received considerable attention, but few studies have investigated the implementation of a detection approach on a mobile phone. Phone applications reduce the need for specialised hardware and hence, enable a cost-effective roll-out of the technology across the driving population. While it has been shown that three-dimensional (3D) operations are more suitable for spatiotemporal feature learning, current methods for drowsiness detection commonly use frame-based, multi-step approaches. However, computationally expensive techniques that achieve superior results on action recognition benchmarks (e.g. 3D convolutions, optical flow extraction) create bottlenecks for real-time, safety-critical applications on mobile devices. Here, we show how depthwise separable 3D convolutions, combined with an early fusion of spatial and temporal information, can achieve a balance between high prediction accuracy and real-time inference requirements. In particular, increased accuracy is achieved when assessment requires motion information, for example, when sunglasses conceal the eyes. Further, a custom TensorFlow-based smartphone application shows the true impact of various approaches on inference times and demonstrates the effectiveness of real-time monitoring based on out-of-sample data to alert a drowsy driver. Our model is pre-trained on ImageNet and Kinetics and fine-tuned on a publicly available Driver Drowsiness Detection dataset. Fine-tuning on large naturalistic driving datasets could further improve accuracy to obtain robust in-vehicle performance. Overall, our research is a step towards practical deep learning applications, potentially preventing micro-sleeps and reducing road trauma.
Tasks	Optical Flow Estimation
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06540v1
PDF	https://arxiv.org/pdf/1910.06540v1.pdf
PWC	https://paperswithcode.com/paper/real-time-monitoring-of-driver-drowsiness-on
Repo
Framework

OmniTrack: Real-time detection and tracking of objects, text and logos in video


Title	OmniTrack: Real-time detection and tracking of objects, text and logos in video
Authors	Hannes Fassold, Ridouane Ghermi
Abstract	The automatic detection and tracking of general objects (like persons, animals or cars), text and logos in a video is crucial for many video understanding tasks, and usually real-time processing as required. We propose OmniTrack, an efficient and robust algorithm which is able to automatically detect and track objects, text as well as brand logos in real-time. It combines a powerful deep learning based object detector (YoloV3) with high-quality optical flow methods. Based on the reference YoloV3 C++ implementation, we did some important performance optimizations which will be described. The major steps in the training procedure for the combined detector for text and logo will be presented. We will describe then the OmniTrack algorithm, consisting of the phases preprocessing, feature calculation, prediction, matching and update. Several performance optimizations have been implemented there as well, like doing the object detection and optical flow calculation asynchronously. Experiments show that the proposed algorithm runs in real-time for standard definition ($720x576$) video on a PC with a Quadro RTX 5000 GPU.
Tasks	Object Detection, Optical Flow Estimation, Video Understanding
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06017v1
PDF	https://arxiv.org/pdf/1910.06017v1.pdf
PWC	https://paperswithcode.com/paper/omnitrack-real-time-detection-and-tracking-of
Repo
Framework

Fiducia: A Personalized Food Recommender System for Zomato


Title	Fiducia: A Personalized Food Recommender System for Zomato
Authors	Mansi Goel, Ayush Agarwal, Deepak Thukral, Tanmoy Chakraborty
Abstract	This paper presents Fiducia, a food review system involving a pipeline which processes restaurant-related reviews obtained from Zomato (India’s largest restaurant search and discovery service). Fiducia is specific to popular cafe food items and manages to identify relevant information pertaining to each item separately in the reviews. It uses a sentiment check on these pieces of text and accordingly suggests an appropriate restaurant for the particular item depending on user-item and item-item similarity. Experimental results show that the sentiment analyzer module of Fiducia achieves an accuracy of over 85% and our final recommender system achieves an RMSE of about 1.01 beating other baselines.
Tasks	Recommendation Systems
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10117v1
PDF	http://arxiv.org/pdf/1903.10117v1.pdf
PWC	https://paperswithcode.com/paper/fiducia-a-personalized-food-recommender
Repo
Framework

BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge


Title	BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge
Authors	Jeff Da
Abstract	We introduce a simple yet effective method of integrating contextual embeddings with commonsense graph embeddings, dubbed BERT Infused Graphs: Matching Over Other embeDdings. First, we introduce a preprocessing method to improve the speed of querying knowledge bases. Then, we develop a method of creating knowledge embeddings from each knowledge base. We introduce a method of aligning tokens between two misaligned tokenization methods. Finally, we contribute a method of contextualizing BERT after combining with knowledge base embeddings. We also show BERTs tendency to correct lower accuracy question types. Our model achieves a higher accuracy than BERT, and we score fifth on the official leaderboard of the shared task and score the highest without any additional language model pretraining.
Tasks	Language Modelling, Tokenization
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07713v1
PDF	https://arxiv.org/pdf/1910.07713v1.pdf
PWC	https://paperswithcode.com/paper/big-mood-relating-transformers-to-explicit
Repo
Framework

GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier


Title	GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier
Authors	Alexandre Gariépy, Jean-Christophe Ruel, Brahim Chaib-draa, Philippe Giguère
Abstract	Grasping is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externally-trained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in grasping. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQ-STN is able to detect robust grasps on the depth images of the Dex-Net 2.0 dataset with 92.4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous sampling-based methods, while being more than 60 times faster.
Tasks
Published	2019-03-06
URL	https://arxiv.org/abs/1903.02489v2
PDF	https://arxiv.org/pdf/1903.02489v2.pdf
PWC	https://paperswithcode.com/paper/gq-stn-optimizing-one-shot-grasp-detection
Repo
Framework

On the Global Convergence of (Fast) Incremental Expectation Maximization Methods


Title	On the Global Convergence of (Fast) Incremental Expectation Maximization Methods
Authors	Belhal Karimi, Hoi-To Wai, Eric Moulines, Marc Lavielle
Abstract	The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the algorithm. To alleviate this problem, Neal and Hinton have proposed an incremental version of the EM (iEM) in which at each iteration the conditional expectation of the latent data (E-step) is updated only for a mini-batch of observations. Another approach has been proposed by Capp'e and Moulines in which the E-step is replaced by a stochastic approximation step, closely related to stochastic gradient. In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of Chen et. al. in a common unifying framework. We also introduce a new version incremental version, inspired by the SAGA algorithm by Defazio et. al. We establish non-asymptotic convergence bounds for global convergence. Numerical applications are presented in this article to illustrate our findings.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12521v1
PDF	https://arxiv.org/pdf/1910.12521v1.pdf
PWC	https://paperswithcode.com/paper/on-the-global-convergence-of-fast-incremental
Repo
Framework

Exploring Unlabeled Faces for Novel Attribute Discovery


Title	Exploring Unlabeled Faces for Novel Attribute Discovery
Authors	Hyojin Bahng, Sunghyo Chung, Seungjoo Yoo, Jaegul Choo
Abstract	Despite remarkable success in unpaired image-to-image translation, existing systems still require a large amount of labeled images. This is a bottleneck for their real-world applications; in practice, a model trained on labeled CelebA dataset does not work well for test images from a different distribution – greatly limiting their application to unlabeled images of a much larger quantity. In this paper, we attempt to alleviate this necessity for labeled data in the facial image translation domain. We aim to explore the degree to which you can discover novel attributes from unlabeled faces and perform high-quality translation. To this end, we use prior knowledge about the visual world as guidance to discover novel attributes and transfer them via a novel normalization method. Experiments show that our method trained on unlabeled data produces high-quality translations, preserves identity, and be perceptually realistic as good as, or better than, state-of-the-art methods trained on labeled data.
Tasks	Image-to-Image Translation
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03085v1
PDF	https://arxiv.org/pdf/1912.03085v1.pdf
PWC	https://paperswithcode.com/paper/exploring-unlabeled-faces-for-novel-attribute
Repo
Framework