Paper Group ANR 1226
CheckNet: Secure Inference on Untrusted Devices. Learning to Grasp from 2.5D images: a Deep Reinforcement Learning Approach. A Probabilistic Framework for Learning Domain Specific Hierarchical Word Embeddings. Modern Problems Require Modern Solutions: Hybrid Concepts for Industrial Intrusion Detection. Real-Time Object Tracking via Meta-Learning: E …
CheckNet: Secure Inference on Untrusted Devices
Title | CheckNet: Secure Inference on Untrusted Devices |
Authors | Marcus Comiter, Surat Teerapittayanon, H. T. Kung |
Abstract | We introduce CheckNet, a method for secure inference with deep neural networks on untrusted devices. CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker. CheckNet is completely transparent to the third party running the computation, applicable to all types of neural networks, does not require specialized hardware, adds little overhead, and has negligible impact on model performance. CheckNet can be configured to provide different levels of security depending on application needs and compute/communication budgets. We present both empirical and theoretical validation of CheckNet on multiple popular deep neural network models, showing excellent attack detection (0.88-0.99 AUC) and attack success bounds. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07148v1 |
https://arxiv.org/pdf/1906.07148v1.pdf | |
PWC | https://paperswithcode.com/paper/checknet-secure-inference-on-untrusted |
Repo | |
Framework | |
Learning to Grasp from 2.5D images: a Deep Reinforcement Learning Approach
Title | Learning to Grasp from 2.5D images: a Deep Reinforcement Learning Approach |
Authors | Alessia Bertugli, Paolo Galeone |
Abstract | In this paper, we propose a deep reinforcement learning (DRL) solution to the grasping problem using 2.5D images as the only source of information. In particular, we developed a simulated environment where a robot equipped with a vacuum gripper has the aim of reaching blocks with planar surfaces. These blocks can have different dimensions, shapes, position and orientation. Unity 3D allowed us to simulate a real-world setup, where a depth camera is placed in a fixed position and the stream of images is used by our policy network to learn how to solve the task. We explored different DRL algorithms and problem configurations. The experiments demonstrated the effectiveness of the proposed DRL algorithm applied to grasp tasks guided by visual depth camera inputs. When using the proper policy, the proposed method estimates a robot tool configuration that reaches the object surface with negligible position and orientation errors. This is, to the best of our knowledge, the first successful attempt of using 2.5D images only as of the input of a DRL algorithm, to solve the grasping problem regressing 3D world coordinates. |
Tasks | |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03440v1 |
https://arxiv.org/pdf/1908.03440v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-grasp-from-25d-images-a-deep |
Repo | |
Framework | |
A Probabilistic Framework for Learning Domain Specific Hierarchical Word Embeddings
Title | A Probabilistic Framework for Learning Domain Specific Hierarchical Word Embeddings |
Authors | Lahari Poddar, Gyorgy Szarvas, Lea Frermann |
Abstract | The meaning of a word often varies depending on its usage in different domains. The standard word embedding models struggle to represent this variation, as they learn a single global representation for a word. We propose a method to learn domain-specific word embeddings, from text organized into hierarchical domains, such as reviews in an e-commerce website, where products follow a taxonomy. Our structured probabilistic model allows vector representations for the same word to drift away from each other for distant domains in the taxonomy, to accommodate its domain-specific meanings. By learning sets of domain-specific word representations jointly, our model can leverage domain relationships, and it scales well with the number of domains. Using large real-world review datasets, we demonstrate the effectiveness of our model compared to state-of-the-art approaches, in learning domain-specific word embeddings that are both intuitive to humans and benefit downstream NLP tasks. |
Tasks | Word Embeddings |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07333v2 |
https://arxiv.org/pdf/1910.07333v2.pdf | |
PWC | https://paperswithcode.com/paper/a-probabilistic-framework-for-learning-domain |
Repo | |
Framework | |
Modern Problems Require Modern Solutions: Hybrid Concepts for Industrial Intrusion Detection
Title | Modern Problems Require Modern Solutions: Hybrid Concepts for Industrial Intrusion Detection |
Authors | Simon D. Duque Anton, Mathias Strufe, Hans Dieter Schotten |
Abstract | The concept of Industry 4.0 brings a disruption into the processing industry. It is characterised by a high degree of intercommunication, embedded computation, resulting in a decentralised and distributed handling of data. Additionally, cloud-storage and Software-as-a-Service (SaaS) approaches enhance a centralised storage and handling of data. This often takes place in third-party networks. Furthermore, Industry 4.0 is driven by novel business cases. Lot sizes of one, customer individual production, observation of process state and progress in real-time and remote maintenance, just to name a few. All of these new business cases make use of the novel technologies. However, cyber security has not been an issue in industry. Industrial networks have been considered physically separated from public networks. Additionally, the high level of uniqueness of any industrial network was said to prevent attackers from exploiting flaws. Those assumptions are inherently broken by the concept of Industry 4.0. As a result, an abundance of attack vectors is created. In the past, attackers have used those attack vectors in spectacular fashions. Especially Small and Mediumsized Enterprises (SMEs) in Germany struggle to adapt to these challenges. Reasons are the cost required for technical solutions and security professionals. In order to enable SMEs to cope with the growing threat in the cyberspace, the research project IUNO Insec aims at providing and improving security solutions that can be used without specialised security knowledge. The project IUNO Insec is briefly introduced in this work. Furthermore, contributions in the field of intrusion detection, especially machine learning-based solutions, for industrial environments provided by the authors are presented and set into context. |
Tasks | Intrusion Detection |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.05984v2 |
https://arxiv.org/pdf/1905.05984v2.pdf | |
PWC | https://paperswithcode.com/paper/modern-problems-require-modern-solutions |
Repo | |
Framework | |
Real-Time Object Tracking via Meta-Learning: Efficient Model Adaptation and One-Shot Channel Pruning
Title | Real-Time Object Tracking via Meta-Learning: Efficient Model Adaptation and One-Shot Channel Pruning |
Authors | Ilchae Jung, Kihyun You, Hyeonwoo Noh, Minsu Cho, Bohyung Han |
Abstract | We propose a novel meta-learning framework for real-time object tracking with efficient model adaptation and channel pruning. Given an object tracker, our framework learns to fine-tune its model parameters in only a few iterations of gradient-descent during tracking while pruning its network channels using the target ground-truth at the first frame. Such a learning problem is formulated as a meta-learning task, where a meta-tracker is trained by updating its meta-parameters for initial weights, learning rates, and pruning masks through carefully designed tracking simulations. The integrated meta-tracker greatly improves tracking performance by accelerating the convergence of online learning and reducing the cost of feature computation. Experimental evaluation on the standard datasets demonstrates its outstanding accuracy and speed compared to the state-of-the-art methods. |
Tasks | Meta-Learning, Object Tracking |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.11170v3 |
https://arxiv.org/pdf/1911.11170v3.pdf | |
PWC | https://paperswithcode.com/paper/real-time-object-tracking-via-meta-learning |
Repo | |
Framework | |
Fooling a Real Car with Adversarial Traffic Signs
Title | Fooling a Real Car with Adversarial Traffic Signs |
Authors | Nir Morgulis, Alexander Kreines, Shachar Mendelowitz, Yuval Weisglass |
Abstract | The attacks on the neural-network-based classifiers using adversarial images have gained a lot of attention recently. An adversary can purposely generate an image that is indistinguishable from a innocent image for a human being but is incorrectly classified by the neural networks. The adversarial images do not need to be tuned to a particular architecture of the classifier - an image that fools one network can fool another one with a certain success rate.The published works mostly concentrate on the use of modified image files for attacks against the classifiers trained on the model databases. Although there exists a general understanding that such attacks can be carried in the real world as well, the works considering the real-world attacks are scarce. Moreover, to the best of our knowledge, there have been no reports on the attacks against real production-grade image classification systems.In our work we present a robust pipeline for reproducible production of adversarial traffic signs that can fool a wide range of classifiers, both open-source and production-grade in the real world. The efficiency of the attacks was checked both with the neural-network-based classifiers and legacy computer vision systems. Most of the attacks have been performed in the black-box mode, e.g. the adversarial signs produced for a particular classifier were used to attack a variety of other classifiers. The efficiency was confirmed in drive-by experiments with a production-grade traffic sign recognition systems of a real car. |
Tasks | Image Classification, Traffic Sign Recognition |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00374v1 |
https://arxiv.org/pdf/1907.00374v1.pdf | |
PWC | https://paperswithcode.com/paper/fooling-a-real-car-with-adversarial-traffic |
Repo | |
Framework | |
Flow Based Self-supervised Pixel Embedding for Image Segmentation
Title | Flow Based Self-supervised Pixel Embedding for Image Segmentation |
Authors | Bin Ma, Shubao Liu, Yingxuan Zhi, Qi Song |
Abstract | We propose a new self-supervised approach to image feature learning from motion cue. This new approach leverages recent advances in deep learning in two directions: 1) the success of training deep neural network in estimating optical flow in real data using synthetic flow data; and 2) emerging work in learning image features from motion cues, such as optical flow. Building on these, we demonstrate that image features can be learned in self-supervision by first training an optical flow estimator with synthetic flow data, and then learning image features from the estimated flows in real motion data. We demonstrate and evaluate this approach on an image segmentation task. Using the learned image feature representation, the network performs significantly better than the ones trained from scratch in few-shot segmentation tasks. |
Tasks | Optical Flow Estimation, Semantic Segmentation |
Published | 2019-01-02 |
URL | http://arxiv.org/abs/1901.00520v2 |
http://arxiv.org/pdf/1901.00520v2.pdf | |
PWC | https://paperswithcode.com/paper/flow-based-self-supervised-pixel-embedding |
Repo | |
Framework | |
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Title | Lipper: Synthesizing Thy Speech using Multi-View Lipreading |
Authors | Yaman Kumar, Rohit Jain, Khwaja Mohd. Salik, Rajiv Ratn Shah, Yifang yin, Roger Zimmermann |
Abstract | Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary mapping. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper. |
Tasks | Lipreading |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1907.01367v1 |
https://arxiv.org/pdf/1907.01367v1.pdf | |
PWC | https://paperswithcode.com/paper/lipper-synthesizing-thy-speech-using-multi |
Repo | |
Framework | |
A novel spike-and-wave automatic detection in EEG signals
Title | A novel spike-and-wave automatic detection in EEG signals |
Authors | Antonio Quintero-Rincón, Valeria Muro, Carlos D’Giano, Jorge Prendes, Hadj Batatia |
Abstract | Spike-and-wave discharge (SWD) pattern classification in electroencephalography (EEG) signals is a key problem in signal processing. It is particularly important to develop a SWD automatic detection method in long-term EEG recordings since the task of marking the patters manually is time consuming, difficult and error-prone. This paper presents a new detection method with a low computational complexity that can be easily trained if standard medical protocols are respected. The detection procedure is as follows: First, each EEG signal is divided into several time segments and for each time segment, the Morlet 1-D decomposition is applied. Then three parameters are extracted from the wavelet coefficients of each segment: scale (using a generalized Gaussian statistical model), variance and median. This is followed by a k-nearest neighbors (k-NN) classifier to detect the spike-and-wave pattern in each EEG channel from these three parameters. A total of 106 spike-and-wave and 106 non-spike-and-wave were used for training, while 69 new annotated EEG segments from six subjects were used for classification. In these circumstances, the proposed methodology achieved 100% accuracy. These results generate new research opportunities for the underlying causes of the so-called absence epilepsy in long-term EEG recordings. |
Tasks | EEG |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.07123v1 |
https://arxiv.org/pdf/1912.07123v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-spike-and-wave-automatic-detection-in |
Repo | |
Framework | |
A Robust Visual System for Small Target Motion Detection Against Cluttered Moving Backgrounds
Title | A Robust Visual System for Small Target Motion Detection Against Cluttered Moving Backgrounds |
Authors | Hongxin Wang, Jigen Peng, Xuqiang Zheng, Shigang Yue |
Abstract | Monitoring small objects against cluttered moving backgrounds is a huge challenge to future robotic vision systems. As a source of inspiration, insects are quite apt at searching for mates and tracking prey – which always appear as small dim speckles in the visual field. The exquisite sensitivity of insects for small target motion, as revealed recently, is coming from a class of specific neurons called small target motion detectors (STMDs). Although a few STMD-based models have been proposed, these existing models only use motion information for small target detection and cannot discriminate small targets from small-target-like background features (named as fake features). To address this problem, this paper proposes a novel visual system model (STMD+) for small target motion detection, which is composed of four subsystems – ommatidia, motion pathway, contrast pathway and mushroom body. Compared to existing STMD-based models, the additional contrast pathway extracts directional contrast from luminance signals to eliminate false positive background motion. The directional contrast and the extracted motion information by the motion pathway are integrated in the mushroom body for small target discrimination. Extensive experiments showed the significant and consistent improvements of the proposed visual system model over existing STMD-based models against fake features. |
Tasks | Motion Detection |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04363v1 |
http://arxiv.org/pdf/1904.04363v1.pdf | |
PWC | https://paperswithcode.com/paper/a-robust-visual-system-for-small-target |
Repo | |
Framework | |
Can Transfer Entropy Infer Information Flow in Neuronal Circuits for Cognitive Processing?
Title | Can Transfer Entropy Infer Information Flow in Neuronal Circuits for Cognitive Processing? |
Authors | Ali Tehrani-Saleh, Christoph Adami |
Abstract | To infer information flow in any network of agents, it is important first and foremost to establish causal temporal relations between the nodes. Practical and automated methods that can infer causality are difficult to find, and the subject of ongoing research. While Shannon information only detects correlation, there are several information-theoretic notions of “directed information” that have successfully detected causality in some systems, in particular in the neuroscience community. However, recent work has shown that some directed information measures can sometimes inadequately estimate the extent of causal relations, or even fail to identify existing cause-effect relations between components of systems, especially if neurons contribute in a cryptographic manner to influence the effector neuron. Here, we test how often cryptographic logic emerges in an evolutionary process that generates artificial neural circuits for two fundamental cognitive tasks: motion detection and sound localization. We also test whether activity time-series recorded from behaving digital brains can infer information flow using the transfer entropy concept, when compared to a ground-truth model of causal influence constructed from connectivity and circuit logic. Our results suggest that transfer entropy will sometimes fail to infer causality when it exists, and sometimes suggest a causal connection when there is none. However, the extent of incorrect inference strongly depends on the cognitive task considered. These results emphasize the importance of understanding the fundamental logic processes that contribute to information flow in cognitive processing, and quantifying their relevance in any given nervous system. |
Tasks | Motion Detection, Time Series |
Published | 2019-01-22 |
URL | https://arxiv.org/abs/1901.07589v2 |
https://arxiv.org/pdf/1901.07589v2.pdf | |
PWC | https://paperswithcode.com/paper/can-transfer-entropy-infer-causality-in |
Repo | |
Framework | |
Privileged Features Distillation at Taobao Recommendations
Title | Privileged Features Distillation at Taobao Recommendations |
Authors | Chen Xu, Quan Li, Junfeng Ge, Jinyang Gao, Xiaoyong Yang, Changhua Pei, Fei Sun, Jian Wu, Hanxiao Sun, Wenwu Ou |
Abstract | Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving. We define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student to improve its accuracy. During serving, only the student part is extracted and it relies on no privileged features. We conduct experiments on two fundamental prediction tasks at Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over their strong baselines. During the on-line A/B tests, the click metric is improved by +5.0% in the CTR task. And the conversion metric is improved by +2.3% in the CVR task. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05171v2 |
https://arxiv.org/pdf/1907.05171v2.pdf | |
PWC | https://paperswithcode.com/paper/privileged-features-distillation-for-e |
Repo | |
Framework | |
On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games
Title | On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games |
Authors | Eric V. Mazumdar, Michael I. Jordan, S. Shankar Sastry |
Abstract | We propose local symplectic surgery, a two-timescale procedure for finding local Nash equilibria in two-player zero-sum games. We first show that previous gradient-based algorithms cannot guarantee convergence to local Nash equilibria due to the existence of non-Nash stationary points. By taking advantage of the differential structure of the game, we construct an algorithm for which the local Nash equilibria are the only attracting fixed points. We also show that the algorithm exhibits no oscillatory behaviors in neighborhoods of equilibria and show that it has the same per-iteration complexity as other recently proposed algorithms. We conclude by validating the algorithm on two numerical examples: a toy example with multiple Nash equilibria and a non-Nash equilibrium, and the training of a small generative adversarial network (GAN). |
Tasks | |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00838v2 |
http://arxiv.org/pdf/1901.00838v2.pdf | |
PWC | https://paperswithcode.com/paper/on-finding-local-nash-equilibria-and-only |
Repo | |
Framework | |
One-Pass Sparsified Gaussian Mixtures
Title | One-Pass Sparsified Gaussian Mixtures |
Authors | Eric Kightley, Stephen Becker |
Abstract | We present a one-pass sparsified Gaussian mixture model (SGMM). Given $N$ data points in $P$ dimensions, $X$, the model fits $K$ Gaussian distributions to $X$ and (softly) classifies each point to these clusters. After paying an up-front cost of $\mathcal{O}(NP\log P)$ to precondition the data, we subsample $Q$ entries of each data point and discard the full $P$-dimensional data. SGMM operates in $\mathcal{O}(KNQ)$ time per iteration for diagonal or spherical covariances, independent of $P$, while estimating the model parameters in the full $P$-dimensional space, making it one-pass and hence suitable for streaming data. We derive the maximum likelihood estimators for the parameters in the sparsified regime, demonstrate clustering on synthetic and real data, and show that SGMM is faster than GMM while preserving accuracy. |
Tasks | |
Published | 2019-03-10 |
URL | https://arxiv.org/abs/1903.04056v2 |
https://arxiv.org/pdf/1903.04056v2.pdf | |
PWC | https://paperswithcode.com/paper/one-pass-sparsified-gaussian-mixtures |
Repo | |
Framework | |
On the Robustness of Projection Neural Networks For Efficient Text Representation: An Empirical Study
Title | On the Robustness of Projection Neural Networks For Efficient Text Representation: An Empirical Study |
Authors | Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva |
Abstract | Recently, there has been strong interest in developing natural language applications that live on personal devices such as mobile phones, watches and IoT with the objective to preserve user privacy and have low memory. Advances in Locality-Sensitive Hashing (LSH)-based projection networks have demonstrated state-of-the-art performance without any embedding lookup tables and instead computing on-the-fly text representations. However, previous works have not investigated “What makes projection neural networks effective at capturing compact representations for text classification?” and “Are these projection models resistant to perturbations and misspellings in input text?". In this paper, we analyze and answer these questions through perturbation analyses and by running experiments on multiple dialog act prediction tasks. Our results show that the projections are resistant to perturbations and misspellings compared to widely-used recurrent architectures that use word embeddings. On ATIS intent prediction task, when evaluated with perturbed input data, we observe that the performance of recurrent models that use word embeddings drops significantly by more than 30% compared to just 5% with projection networks, showing that LSH-based projection representations are robust and consistently lead to high quality performance. |
Tasks | Text Classification, Word Embeddings |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05763v1 |
https://arxiv.org/pdf/1908.05763v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-robustness-of-projection-neural |
Repo | |
Framework | |