Paper Group ANR 286
ODN: Opening the Deep Network for Open-set Action Recognition. Towards Corner Case Detection for Autonomous Driving. Perils of Zero-Interaction Security in the Internet of Things. Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets. Peer-to-peer Federated Learning on Graphs. Evaluating Text-to-Image Matching …
ODN: Opening the Deep Network for Open-set Action Recognition
Title | ODN: Opening the Deep Network for Open-set Action Recognition |
Authors | Yu Shu, Yemin Shi, Yaowei Wang, Yixiong Zou, Qingsheng Yuan, Yonghong Tian |
Abstract | In recent years, the performance of action recognition has been significantly improved with the help of deep neural networks. Most of the existing action recognition works hold the \textit{closed-set} assumption that all action categories are known beforehand while deep networks can be well trained for these categories. However, action recognition in the real world is essentially an \textit{open-set} problem, namely, it is impossible to know all action categories beforehand and consequently infeasible to prepare sufficient training samples for those emerging categories. In this case, applying closed-set recognition methods will definitely lead to unseen-category errors. To address this challenge, we propose the Open Deep Network (ODN) for the open-set action recognition task. Technologically, ODN detects new categories by applying a multi-class triplet thresholding method, and then dynamically reconstructs the classification layer and “opens” the deep network by adding predictors for new categories continually. In order to transfer the learned knowledge to the new category, two novel methods, Emphasis Initialization and Allometry Training, are adopted to initialize and incrementally train the new predictor so that only few samples are needed to fine-tune the model. Extensive experiments show that ODN can effectively detect and recognize new categories with little human intervention, thus applicable to the open-set action recognition tasks in the real world. Moreover, ODN can even achieve comparable performance to some closed-set methods. |
Tasks | Temporal Action Localization |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07757v1 |
http://arxiv.org/pdf/1901.07757v1.pdf | |
PWC | https://paperswithcode.com/paper/odn-opening-the-deep-network-for-open-set |
Repo | |
Framework | |
Towards Corner Case Detection for Autonomous Driving
Title | Towards Corner Case Detection for Autonomous Driving |
Authors | Jan-Aike Bolte, Andreas Bär, Daniel Lipinski, Tim Fingscheidt |
Abstract | The progress in autonomous driving is also due to the increased availability of vast amounts of training data for the underlying machine learning approaches. Machine learning systems are generally known to lack robustness, e.g., if the training data did rarely or not at all cover critical situations. The challenging task of corner case detection in video, which is also somehow related to unusual event or anomaly detection, aims at detecting these unusual situations, which could become critical, and to communicate this to the autonomous driving system (online use case). Such a system, however, could be also used in offline mode to screen vast amounts of data and select only the relevant situations for storing and (re)training machine learning algorithms. So far, the approaches for corner case detection have been limited to videos recorded from a fixed camera, mostly for security surveillance. In this paper, we provide a formal definition of a corner case and propose a system framework for both the online and the offline use case that can handle video signals from front cameras of a naturally moving vehicle and can output a corner case score. |
Tasks | Anomaly Detection, Autonomous Driving |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09184v2 |
http://arxiv.org/pdf/1902.09184v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-corner-case-detection-for-autonomous |
Repo | |
Framework | |
Perils of Zero-Interaction Security in the Internet of Things
Title | Perils of Zero-Interaction Security in the Internet of Things |
Authors | Mikhail Fomichev, Max Maass, Lars Almon, Alejandro Molina, Matthias Hollick |
Abstract | The Internet of Things (IoT) demands authentication systems which can provide both security and usability. Recent research utilizes the rich sensing capabilities of smart devices to build security schemes operating without human interaction, such as zero-interaction pairing (ZIP) and zero-interaction authentication (ZIA). Prior work proposed a number of ZIP and ZIA schemes and reported promising results. However, those schemes were often evaluated under conditions which do not reflect realistic IoT scenarios. In addition, drawing any comparison among the existing schemes is impossible due to the lack of a common public dataset and unavailability of scheme implementations. In this paper, we address these challenges by conducting the first large-scale comparative study of ZIP and ZIA schemes, carried out under realistic conditions. We collect and release the most comprehensive dataset in the domain to date, containing over 4250 hours of audio recordings and 1 billion sensor readings from three different scenarios, and evaluate five state-of-the-art schemes based on these data. Our study reveals that the effectiveness of the existing proposals is highly dependent on the scenario they are used in. In particular, we show that these schemes are subject to error rates between 0.6% and 52.8%. |
Tasks | |
Published | 2019-01-22 |
URL | http://arxiv.org/abs/1901.07255v2 |
http://arxiv.org/pdf/1901.07255v2.pdf | |
PWC | https://paperswithcode.com/paper/perils-of-zero-interaction-security-in-the |
Repo | |
Framework | |
Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets
Title | Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets |
Authors | Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Vasileios Belagiannis, Sikandar Amin, Alessio Del Bue, Marco Cristani, Fabio Galasso |
Abstract | In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon. |
Tasks | |
Published | 2019-01-07 |
URL | https://arxiv.org/abs/1901.02000v2 |
https://arxiv.org/pdf/1901.02000v2.pdf | |
PWC | https://paperswithcode.com/paper/forecasting-people-trajectories-and-head |
Repo | |
Framework | |
Peer-to-peer Federated Learning on Graphs
Title | Peer-to-peer Federated Learning on Graphs |
Authors | Anusha Lalitha, Osman Cihan Kilinc, Tara Javidi, Farinaz Koushanfar |
Abstract | We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a distributed learning algorithm in which nodes update their belief by aggregate information from their one-hop neighbors to learn a model that best fits the observations over the entire network. In addition, we also obtain sufficient conditions to ensure that the probability of error is small for every node in the network. We discuss approximations required for applying this algorithm to train Deep Neural Networks (DNNs). Experiments on training linear regression model and on training a DNN show that the proposed learning rule algorithm provides a significant improvement in the accuracy compared to the case where nodes learn without cooperation. |
Tasks | |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1901.11173v1 |
http://arxiv.org/pdf/1901.11173v1.pdf | |
PWC | https://paperswithcode.com/paper/peer-to-peer-federated-learning-on-graphs |
Repo | |
Framework | |
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Title | Evaluating Text-to-Image Matching using Binary Image Selection (BISON) |
Authors | Hexiang Hu, Ishan Misra, Laurens van der Maaten |
Abstract | Providing systems the ability to relate linguistic and visual content is one of the hallmarks of computer vision. Tasks such as text-based image retrieval and image captioning were designed to test this ability but come with evaluation measures that have a high variance or are difficult to interpret. We study an alternative task for systems that match text and images: given a text query, the system is asked to select the image that best matches the query from a pair of semantically similar images. The system’s accuracy on this Binary Image SelectiON (BISON) task is interpretable, eliminates the reliability problems of retrieval evaluations, and focuses on the system’s ability to understand fine-grained visual structure. We gather a BISON dataset that complements the COCO dataset and use it to evaluate modern text-based image retrieval and image captioning systems. Our results provide novel insights into the performance of these systems. The COCO-BISON dataset and corresponding evaluation code are publicly available from \url{http://hexianghu.com/bison/}. |
Tasks | Image Captioning, Image Retrieval |
Published | 2019-01-19 |
URL | http://arxiv.org/abs/1901.06595v2 |
http://arxiv.org/pdf/1901.06595v2.pdf | |
PWC | https://paperswithcode.com/paper/binary-image-selection-bison-interpretable |
Repo | |
Framework | |
Optimal Approach for Image Recognition using Deep Convolutional Architecture
Title | Optimal Approach for Image Recognition using Deep Convolutional Architecture |
Authors | Parth Shah, Vishvajit Bakrola, Supriya Pati |
Abstract | In the recent time deep learning has achieved huge popularity due to its performance in various machine learning algorithms. Deep learning as hierarchical or structured learning attempts to model high level abstractions in data by using a group of processing layers. The foundation of deep learning architectures is inspired by the understanding of information processing and neural responses in human brain. The architectures are created by stacking multiple linear or non-linear operations. The article mainly focuses on the state-of-art deep learning models and various real world applications specific training methods. Selecting optimal architecture for specific problem is a challenging task, at a closing stage of the article we proposed optimal approach to deep convolutional architecture for the application of image recognition. |
Tasks | |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11187v1 |
http://arxiv.org/pdf/1904.11187v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-approach-for-image-recognition-using |
Repo | |
Framework | |
AI for Earth: Rainforest Conservation by Acoustic Surveillance
Title | AI for Earth: Rainforest Conservation by Acoustic Surveillance |
Authors | Yuan Liu, Zhongwei Cheng, Jie Liu, Bourhan Yassin, Zhe Nan, Jiebo Luo |
Abstract | Saving rainforests is a key to halting adverse climate changes. In this paper, we introduce an innovative solution built on acoustic surveillance and machine learning technologies to help rainforest conservation. In particular, We propose new convolutional neural network (CNN) models for environmental sound classification and achieved promising preliminary results on two datasets, including a public audio dataset and our real rainforest sound dataset. The proposed audio classification models can be easily extended in an automated machine learning paradigm and integrated in cloud-based services for real world deployment. |
Tasks | Audio Classification, Environmental Sound Classification |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07517v1 |
https://arxiv.org/pdf/1908.07517v1.pdf | |
PWC | https://paperswithcode.com/paper/ai-for-earth-rainforest-conservation-by |
Repo | |
Framework | |
Deep Neural Baselines for Computational Paralinguistics
Title | Deep Neural Baselines for Computational Paralinguistics |
Authors | Daniel Elsner, Stefan Langer, Fabian Ritz, Robert Müller, Steffen Illium |
Abstract | Detecting sleepiness from spoken language is an ambitious task, which is addressed by the Interspeech 2019 Computational Paralinguistics Challenge (ComParE). We propose an end-to-end deep learning approach to detect and classify patterns reflecting sleepiness in the human voice. Our approach is based solely on a moderately complex deep neural network architecture. It may be applied directly on the audio data without requiring any specific feature engineering, thus remaining transferable to other audio classification tasks. Nevertheless, our approach performs similar to state-of-the-art machine learning models. |
Tasks | Audio Classification, Feature Engineering |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02864v1 |
https://arxiv.org/pdf/1907.02864v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-baselines-for-computational |
Repo | |
Framework | |
On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification
Title | On the performance of residual block design alternatives in convolutional neural networks for end-to-end audio classification |
Authors | Javier Naranjo-Alcazar, Sergi Perez-Castanos, Irene Martin-Morato, Pedro Zuccarello, Maximo Cobos |
Abstract | Residual learning is a recently proposed learning framework to facilitate the training of very deep neural networks. Residual blocks or units are made of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or residual connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers that make up a residual block. While ResNet architectures for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, few works have adopted ResNet architectures so far for 1D audio classification tasks. Thus, the suitability of different residual block designs for raw audio classification is partly unknown. The purpose of this paper is to analyze and discuss the performance of several residual block implementations within a state-of-the-art CNN-based architecture for end-to-end audio classification using raw audio waveforms. For comparison purposes, we analyze as well the performance of the residual blocks under a similar 2D architecture using a conventional time-frequency audio represen-tation as input. The results show that the achieved accuracy is considerably dependent, not only on the specific residual block implementation, but also on the selected input normalization. |
Tasks | Audio Classification, Image Classification |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.10891v3 |
https://arxiv.org/pdf/1906.10891v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-performance-of-residual-block-design |
Repo | |
Framework | |
VoteNet: A Deep Learning Label Fusion Method for Multi-Atlas Segmentation
Title | VoteNet: A Deep Learning Label Fusion Method for Multi-Atlas Segmentation |
Authors | Zhipeng Ding, Xu Han, Marc Niethammer |
Abstract | Deep learning (DL) approaches are state-of-the-art for many medical image segmentation tasks. They offer a number of advantages: they can be trained for specific tasks, computations are fast at test time, and segmentation quality is typically high. In contrast, previously popular multi-atlas segmentation (MAS) methods are relatively slow (as they rely on costly registrations) and even though sophisticated label fusion strategies have been proposed, DL approaches generally outperform MAS. In this work, we propose a DL-based label fusion strategy (VoteNet) which locally selects a set of reliable atlases whose labels are then fused via plurality voting. Experiments on 3D brain MRI data show that by selecting a good initial atlas set MAS with VoteNet significantly outperforms a number of other label fusion strategies as well as a direct DL segmentation approach. We also provide an experimental analysis of the upper performance bound achievable by our method. While unlikely achievable in practice, this bound suggests room for further performance improvements. Lastly, to address the runtime disadvantage of standard MAS, all our results make use of a fast DL registration approach. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08963v2 |
https://arxiv.org/pdf/1904.08963v2.pdf | |
PWC | https://paperswithcode.com/paper/votenet-a-deep-learning-label-fusion-method |
Repo | |
Framework | |
Novelty Messages Filtering for Multi Agent Privacy-preserving Planning
Title | Novelty Messages Filtering for Multi Agent Privacy-preserving Planning |
Authors | Alfonso E. Gerevini, Nir Lipovetzky, Nico Peli, Francesco Percassi, Alessandro Saetti, Ivan Serina |
Abstract | In multi-agent planning, agents jointly compute a plan that achieves mutual goals, keeping certain information private to the individual agents. Agents’ coordination is achieved through the transmission of messages. These messages can be a source of privacy leakage as they can permit a malicious agent to collect information about other agents’ actions and search states. In this paper, we investigate the usage of novelty techniques in the context of (decentralised) multi-agent privacy-preserving planning, addressing the challenges related to the agents’ privacy and performance. In particular, we show that the use of novelty based techniques can significantly reduce the number of messages transmitted among agents, better preserving their privacy and improving their performance. An experimental study analyses the effectiveness of our techniques and compares them with the state-of-the-art. Finally, we evaluate the robustness of our approach, considering different delays in the transmission of messages as they would occur in overloaded networks, due for example to massive attacks or critical situations. |
Tasks | |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.08061v1 |
https://arxiv.org/pdf/1906.08061v1.pdf | |
PWC | https://paperswithcode.com/paper/novelty-messages-filtering-for-multi-agent |
Repo | |
Framework | |
A Robust Approach for Securing Audio Classification Against Adversarial Attacks
Title | A Robust Approach for Securing Audio Classification Against Adversarial Attacks |
Authors | Mohammad Esmaeilpour, Patrick Cardinal, Alessandro Lameiras Koerich |
Abstract | Adversarial audio attacks can be considered as a small perturbation unperceptive to human ears that is intentionally added to the audio signal and causes a machine learning model to make mistakes. This poses a security concern about the safety of machine learning models since the adversarial attacks can fool such models toward the wrong predictions. In this paper we first review some strong adversarial attacks that may affect both audio signals and their 2D representations and evaluate the resiliency of the most common machine learning model, namely deep learning models and support vector machines (SVM) trained on 2D audio representations such as short time Fourier transform (STFT), discrete wavelet transform (DWT) and cross recurrent plot (CRP) against several state-of-the-art adversarial attacks. Next, we propose a novel approach based on pre-processed DWT representation of audio signals and SVM to secure audio systems against adversarial attacks. The proposed architecture has several preprocessing modules for generating and enhancing spectrograms including dimension reduction and smoothing. We extract features from small patches of the spectrograms using speeded up robust feature (SURF) algorithm which are further used to generate a codebook using the K-Means++ algorithm. Finally, codewords are used to train a SVM on the codebook of the SURF-generated vectors. All these steps yield to a novel approach for audio classification that provides a good trade-off between accuracy and resilience. Experimental results on three environmental sound datasets show the competitive performance of proposed approach compared to the deep neural networks both in terms of accuracy and robustness against strong adversarial attacks. |
Tasks | Audio Classification, Dimensionality Reduction |
Published | 2019-04-24 |
URL | https://arxiv.org/abs/1904.10990v2 |
https://arxiv.org/pdf/1904.10990v2.pdf | |
PWC | https://paperswithcode.com/paper/a-robust-approach-for-securing-audio |
Repo | |
Framework | |
CT Data Curation for Liver Patients: Phase Recognition in Dynamic Contrast-Enhanced CT
Title | CT Data Curation for Liver Patients: Phase Recognition in Dynamic Contrast-Enhanced CT |
Authors | Bo Zhou, Adam P. Harrison, Jiawen Yao, Chi-Tung Cheng, Jing Xiao, Chien-Hung Liao, Le Lu |
Abstract | As the demand for more descriptive machine learning models grows within medical imaging, bottlenecks due to data paucity will exacerbate. Thus, collecting enough large-scale data will require automated tools to harvest data/label pairs from messy and real-world datasets, such as hospital PACS. This is the focus of our work, where we present a principled data curation tool to extract multi-phase CT liver studies and identify each scan’s phase from a real-world and heterogenous hospital PACS dataset. Emulating a typical deployment scenario, we first obtain a set of noisy labels from our institutional partners that are text mined using simple rules from DICOM tags. We train a deep learning system, using a customized and streamlined 3D SE architecture, to identify non-contrast, arterial, venous, and delay phase dynamic CT liver scans, filtering out anything else, including other types of liver contrast studies. To exploit as much training data as possible, we also introduce an aggregated cross entropy loss that can learn from scans only identified as “contrast”. Extensive experiments on a dataset of 43K scans of 7680 patient imaging studies demonstrate that our 3DSE architecture, armed with our aggregated loss, can achieve a mean F1 of 0.977 and can correctly harvest up to 92.7% of studies, which significantly outperforms the text-mined and standard-loss approach, and also outperforms other, and more complex, model architectures. |
Tasks | |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02511v2 |
https://arxiv.org/pdf/1909.02511v2.pdf | |
PWC | https://paperswithcode.com/paper/ct-data-curation-for-liver-patients-phase |
Repo | |
Framework | |
Verifiably Safe Off-Model Reinforcement Learning
Title | Verifiably Safe Off-Model Reinforcement Learning |
Authors | Nathan Fulton, Andre Platzer |
Abstract | The desire to use reinforcement learning in safety-critical settings has inspired a recent interest in formal methods for learning algorithms. Existing formal methods for learning and optimization primarily consider the problem of constrained learning or constrained optimization. Given a single correct model and associated safety constraint, these approaches guarantee efficient learning while provably avoiding behaviors outside the safety constraint. Acting well given an accurate environmental model is an important pre-requisite for safe learning, but is ultimately insufficient for systems that operate in complex heterogeneous environments. This paper introduces verification-preserving model updates, the first approach toward obtaining formal safety guarantees for reinforcement learning in settings where multiple environmental models must be taken into account. Through a combination of design-time model updates and runtime model falsification, we provide a first approach toward obtaining formal safety proofs for autonomous systems acting in heterogeneous environments. |
Tasks | |
Published | 2019-02-14 |
URL | http://arxiv.org/abs/1902.05632v1 |
http://arxiv.org/pdf/1902.05632v1.pdf | |
PWC | https://paperswithcode.com/paper/verifiably-safe-off-model-reinforcement |
Repo | |
Framework | |