Paper Group ANR 696
Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid. Compression-Based Regularization with an Application to Multi-Task Learning. Deep Learning the Ising Model Near Criticality. Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces. Fishing for Clickbaits in Social Images and Text …
Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid
Title | Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid |
Authors | Marco Melis, Ambra Demontis, Battista Biggio, Gavin Brown, Giorgio Fumera, Fabio Roli |
Abstract | Deep neural networks have been widely adopted in recent years, exhibiting impressive performances in several application domains. It has however been shown that they can be fooled by adversarial examples, i.e., images altered by a barely-perceivable adversarial noise, carefully crafted to mislead classification. In this work, we aim to evaluate the extent to which robot-vision systems embodying deep-learning algorithms are vulnerable to adversarial examples, and propose a computationally efficient countermeasure to mitigate this threat, based on rejecting classification of anomalous inputs. We then provide a clearer understanding of the safety properties of deep networks through an intuitive empirical analysis, showing that the mapping learned by such networks essentially violates the smoothness assumption of learning algorithms. We finally discuss the main limitations of this work, including the creation of real-world adversarial examples, and sketch promising research directions. |
Tasks | |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.06939v1 |
http://arxiv.org/pdf/1708.06939v1.pdf | |
PWC | https://paperswithcode.com/paper/is-deep-learning-safe-for-robot-vision |
Repo | |
Framework | |
Compression-Based Regularization with an Application to Multi-Task Learning
Title | Compression-Based Regularization with an Application to Multi-Task Learning |
Authors | Matías Vera, Leonardo Rey Vega, Pablo Piantanida |
Abstract | This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to fully describe the data itself, in order to build meaningful representations of a relevant content (multiple labels). We begin by introducing the noisy lossy source coding paradigm with the log-loss fidelity criterion which provides the fundamental tradeoffs between the \emph{cross-entropy loss} (average risk) and the information rate of the features (model complexity). Our approach allows an information theoretic formulation of the \emph{multi-task learning} (MTL) problem which is a supervised learning framework in which the prediction models for several related tasks are learned jointly from common representations to achieve better generalization performance. Then, we present an iterative algorithm for computing the optimal tradeoffs and its global convergence is proven provided that some conditions hold. An important property of this algorithm is that it provides a natural safeguard against overfitting, because it minimizes the average risk taking into account a penalization induced by the model complexity. Remarkably, empirical results illustrate that there exists an optimal information rate minimizing the \emph{excess risk} which depends on the nature and the amount of available training data. An application to hierarchical text categorization is also investigated, extending previous works. |
Tasks | Multi-Task Learning, Text Categorization |
Published | 2017-11-19 |
URL | http://arxiv.org/abs/1711.07099v1 |
http://arxiv.org/pdf/1711.07099v1.pdf | |
PWC | https://paperswithcode.com/paper/compression-based-regularization-with-an |
Repo | |
Framework | |
Deep Learning the Ising Model Near Criticality
Title | Deep Learning the Ising Model Near Criticality |
Authors | Alan Morningstar, Roger G. Melko |
Abstract | It is well established that neural networks with deep architectures perform better than shallow networks for many tasks in machine learning. In statistical physics, while there has been recent interest in representing physical data with generative modelling, the focus has been on shallow neural networks. A natural question to ask is whether deep neural networks hold any advantage over shallow networks in representing such data. We investigate this question by using unsupervised, generative graphical models to learn the probability distribution of a two-dimensional Ising system. Deep Boltzmann machines, deep belief networks, and deep restricted Boltzmann networks are trained on thermal spin configurations from this system, and compared to the shallow architecture of the restricted Boltzmann machine. We benchmark the models, focussing on the accuracy of generating energetic observables near the phase transition, where these quantities are most difficult to approximate. Interestingly, after training the generative networks, we observe that the accuracy essentially depends only on the number of neurons in the first hidden layer of the network, and not on other model details such as network depth or model type. This is evidence that shallow networks are more efficient than deep networks at representing physical probability distributions associated with Ising systems near criticality. |
Tasks | |
Published | 2017-08-15 |
URL | http://arxiv.org/abs/1708.04622v1 |
http://arxiv.org/pdf/1708.04622v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-the-ising-model-near |
Repo | |
Framework | |
Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces
Title | Nonsmooth Analysis and Subgradient Methods for Averaging in Dynamic Time Warping Spaces |
Authors | David Schultz, Brijnesh Jain |
Abstract | Time series averaging in dynamic time warping (DTW) spaces has been successfully applied to improve pattern recognition systems. This article proposes and analyzes subgradient methods for the problem of finding a sample mean in DTW spaces. The class of subgradient methods generalizes existing sample mean algorithms such as DTW Barycenter Averaging (DBA). We show that DBA is a majorize-minimize algorithm that converges to necessary conditions of optimality after finitely many iterations. Empirical results show that for increasing sample sizes the proposed stochastic subgradient (SSG) algorithm is more stable and finds better solutions in shorter time than the DBA algorithm on average. Therefore, SSG is useful in online settings and for non-small sample sizes. The theoretical and empirical results open new paths for devising sample mean algorithms: nonsmooth optimization methods and modified variants of pairwise averaging methods. |
Tasks | Time Series, Time Series Averaging |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06393v1 |
http://arxiv.org/pdf/1701.06393v1.pdf | |
PWC | https://paperswithcode.com/paper/nonsmooth-analysis-and-subgradient-methods |
Repo | |
Framework | |
Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models
Title | Fishing for Clickbaits in Social Images and Texts with Linguistically-Infused Neural Network Models |
Authors | Maria Glenski, Ellyn Ayton, Dustin Arendt, Svitlana Volkova |
Abstract | This paper presents the results and conclusions of our participation in the Clickbait Challenge 2017 on automatic clickbait detection in social media. We first describe linguistically-infused neural network models and identify informative representations to predict the level of clickbaiting present in Twitter posts. Our models allow to answer the question not only whether a post is a clickbait or not, but to what extent it is a clickbait post e.g., not at all, slightly, considerably, or heavily clickbaity using a score ranging from 0 to 1. We evaluate the predictive power of models trained on varied text and image representations extracted from tweets. Our best performing model that relies on the tweet text and linguistic markers of biased language extracted from the tweet and the corresponding page yields mean squared error (MSE) of 0.04, mean absolute error (MAE) of 0.16 and R2 of 0.43 on the held-out test data. For the binary classification setup (clickbait vs. non-clickbait), our model achieved F1 score of 0.69. We have not found that image representations combined with text yield significant performance improvement yet. Nevertheless, this work is the first to present preliminary analysis of objects extracted using Google Tensorflow object detection API from images in clickbait vs. non-clickbait Twitter posts. Finally, we outline several steps to improve model performance as a part of the future work. |
Tasks | Clickbait Detection, Object Detection |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06390v1 |
http://arxiv.org/pdf/1710.06390v1.pdf | |
PWC | https://paperswithcode.com/paper/fishing-for-clickbaits-in-social-images-and |
Repo | |
Framework | |
KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation
Title | KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation |
Authors | Suwon Shon, Hanseok Ko |
Abstract | Korea University Intelligent Signal Processing Lab. (KU-ISPL) developed speaker recognition system for SRE16 fixed training condition. Data for evaluation trials are collected from outside North America, spoken in Tagalog and Cantonese while training data only is spoken English. Thus, main issue for SRE16 is compensating the discrepancy between different languages. As development dataset which is spoken in Cebuano and Mandarin, we could prepare the evaluation trials through preliminary experiments to compensate the language mismatched condition. Our team developed 4 different approaches to extract i-vectors and applied state-of-the-art techniques as backend. To compensate language mismatch, we investigated and endeavored unique method such as unsupervised language clustering, inter language variability compensation and gender/language dependent score normalization. |
Tasks | Speaker Recognition |
Published | 2017-02-03 |
URL | http://arxiv.org/abs/1702.00956v2 |
http://arxiv.org/pdf/1702.00956v2.pdf | |
PWC | https://paperswithcode.com/paper/ku-ispl-speaker-recognition-systems-under |
Repo | |
Framework | |
Causal Data Science for Financial Stress Testing
Title | Causal Data Science for Financial Stress Testing |
Authors | Gelin Gao, Bud Mishra, Daniele Ramazzotti |
Abstract | The most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk management which exploits statistical correlations among financial instruments, here we focus our analysis on the notion of probabilistic causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs); SBCNs are probabilistic graphical models that have many attractive features in terms of more accurate causal analysis for generating financial stress scenarios. In this paper, we present a novel approach for conducting stress testing of financial portfolios based on SBCNs in combination with classical machine learning classification tools. The resulting method is shown to be capable of correctly discovering the causal relationships among financial factors that affect the portfolios and thus, simulating stress testing scenarios with a higher accuracy and lower computational complexity than conventional Monte Carlo Simulations. |
Tasks | |
Published | 2017-03-08 |
URL | http://arxiv.org/abs/1703.03076v2 |
http://arxiv.org/pdf/1703.03076v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-data-science-for-financial-stress |
Repo | |
Framework | |
Where and Who? Automatic Semantic-Aware Person Composition
Title | Where and Who? Automatic Semantic-Aware Person Composition |
Authors | Fuwen Tan, Crispin Bernier, Benjamin Cohen, Vicente Ordonez, Connelly Barnes |
Abstract | Image compositing is a method used to generate realistic yet fake imagery by inserting contents from one image to another. Previous work in compositing has focused on improving appearance compatibility of a user selected foreground segment and a background image (i.e. color and illumination consistency). In this work, we instead develop a fully automated compositing model that additionally learns to select and transform compatible foreground segments from a large collection given only an input image background. To simplify the task, we restrict our problem by focusing on human instance composition, because human segments exhibit strong correlations with their background and because of the availability of large annotated data. We develop a novel branching Convolutional Neural Network (CNN) that jointly predicts candidate person locations given a background image. We then use pre-trained deep feature representations to retrieve person instances from a large segment database. Experimental results show that our model can generate composite images that look visually convincing. We also develop a user interface to demonstrate the potential application of our method. |
Tasks | |
Published | 2017-06-04 |
URL | http://arxiv.org/abs/1706.01021v2 |
http://arxiv.org/pdf/1706.01021v2.pdf | |
PWC | https://paperswithcode.com/paper/where-and-who-automatic-semantic-aware-person |
Repo | |
Framework | |
Quadruplet Network with One-Shot Learning for Fast Visual Object Tracking
Title | Quadruplet Network with One-Shot Learning for Fast Visual Object Tracking |
Authors | Xingping Dong, Jianbing Shen, Dongming Wu, Kan Guo, Xiaogang Jin, Fatih Porikli |
Abstract | In the same vein of discriminative one-shot learning, Siamese networks allow recognizing an object from a single exemplar with the same class label. However, they do not take advantage of the underlying structure of the data and the relationship among the multitude of samples as they only rely on pairs of instances for training. In this paper, we propose a new quadruplet deep network to examine the potential connections among the training instances, aiming to achieve a more powerful representation. We design four shared networks that receive multi-tuple of instances as inputs and are connected by a novel loss function consisting of pair-loss and triplet-loss. According to the similarity metric, we select the most similar and the most dissimilar instances as the positive and negative inputs of triplet loss from each multi-tuple. We show that this scheme improves the training performance. Furthermore, we introduce a new weight layer to automatically select suitable combination weights, which will avoid the conflict between triplet and pair loss leading to worse performance. We evaluate our quadruplet framework by model-free tracking-by-detection of objects from a single initial exemplar in several Visual Object Tracking benchmarks. Our extensive experimental analysis demonstrates that our tracker achieves superior performance with a real-time processing speed of 78 frames-per-second (fps). |
Tasks | Object Tracking, One-Shot Learning, Visual Object Tracking |
Published | 2017-05-19 |
URL | https://arxiv.org/abs/1705.07222v3 |
https://arxiv.org/pdf/1705.07222v3.pdf | |
PWC | https://paperswithcode.com/paper/quadruplet-network-with-one-shot-learning-for |
Repo | |
Framework | |
RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN
Title | RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN |
Authors | Aiala Rosá, Luis Chiruzzo, Mathias Etcheverry, Santiago Castro |
Abstract | This article presents classifiers based on SVM and Convolutional Neural Networks (CNN) for the TASS 2017 challenge on tweets sentiment analysis. The classifier with the best performance in general uses a combination of SVM and CNN. The use of word embeddings was particularly useful for improving the classifiers performance. |
Tasks | Sentiment Analysis, Word Embeddings |
Published | 2017-10-17 |
URL | http://arxiv.org/abs/1710.06393v1 |
http://arxiv.org/pdf/1710.06393v1.pdf | |
PWC | https://paperswithcode.com/paper/retuyt-in-tass-2017-sentiment-analysis-for |
Repo | |
Framework | |
CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation
Title | CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation |
Authors | Wangli Hao, Zhaoxiang Zhang, He Guan |
Abstract | Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks: audio-to-visual, visual-to-audio, audio-to-audio and visual-to-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method. |
Tasks | Audio Generation |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08102v2 |
http://arxiv.org/pdf/1711.08102v2.pdf | |
PWC | https://paperswithcode.com/paper/cmcgan-a-uniform-framework-for-cross-modal |
Repo | |
Framework | |
Model Selection in Bayesian Neural Networks via Horseshoe Priors
Title | Model Selection in Bayesian Neural Networks via Horseshoe Priors |
Authors | Soumya Ghosh, Finale Doshi-Velez |
Abstract | Bayesian Neural Networks (BNNs) have recently received increasing attention for their ability to provide well-calibrated posterior uncertainties. However, model selection—even choosing the number of nodes—remains an open question. In this work, we apply a horseshoe prior over node pre-activations of a Bayesian neural network, which effectively turns off nodes that do not help explain the data. We demonstrate that our prior prevents the BNN from under-fitting even when the number of nodes required is grossly over-estimated. Moreover, this model selection over the number of nodes doesn’t come at the expense of predictive or computational performance; in fact, we learn smaller networks with comparable predictive performance to current approaches. |
Tasks | Model Selection |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10388v1 |
http://arxiv.org/pdf/1705.10388v1.pdf | |
PWC | https://paperswithcode.com/paper/model-selection-in-bayesian-neural-networks |
Repo | |
Framework | |
Neural Networks for Safety-Critical Applications - Challenges, Experiments and Perspectives
Title | Neural Networks for Safety-Critical Applications - Challenges, Experiments and Perspectives |
Authors | Chih-Hong Cheng, Frederik Diehl, Yassine Hamza, Gereon Hinz, Georg Nührenberg, Markus Rickert, Harald Ruess, Michael Troung-Le |
Abstract | We propose a methodology for designing dependable Artificial Neural Networks (ANN) by extending the concepts of understandability, correctness, and validity that are crucial ingredients in existing certification standards. We apply the concept in a concrete case study in designing a high-way ANN-based motion predictor to guarantee safety properties such as impossibility for the ego vehicle to suggest moving to the right lane if there exists another vehicle on its right. |
Tasks | |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.00911v1 |
http://arxiv.org/pdf/1709.00911v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-for-safety-critical |
Repo | |
Framework | |
Toward Incorporation of Relevant Documents in word2vec
Title | Toward Incorporation of Relevant Documents in word2vec |
Authors | Navid Rekabsaz, Bhaskar Mitra, Mihai Lupu, Allan Hanbury |
Abstract | Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms’ co-occurrences in short-window contexts. An alternative (and well-studied) approach in IR for related terms to a query is using local information i.e. a set of top-retrieved documents. In view of these two methods of term relatedness, in this work, we report our study on incorporating the local information of the query in the word embeddings. One main challenge in this direction is that the dense vectors of word embeddings and their estimation of term-to-term relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word representations propose vectors whose dimensions are easily interpretable, and recent methods show competitive performance to the dense vectors. We introduce a neural-based explicit representation, rooted in the conceptual ideas of the word2vec Skip-Gram model. The method provides interpretable explicit vectors while keeping the effectiveness of the Skip-Gram model. The evaluation of various explicit representations on word association collections shows that the newly proposed method out- performs the state-of-the-art explicit representations when tasked with ranking highly similar terms. Based on the introduced ex- plicit representation, we discuss our approaches on integrating local documents in globally-trained embedding models and discuss the preliminary results. |
Tasks | Information Retrieval, Word Embeddings |
Published | 2017-07-20 |
URL | http://arxiv.org/abs/1707.06598v2 |
http://arxiv.org/pdf/1707.06598v2.pdf | |
PWC | https://paperswithcode.com/paper/toward-incorporation-of-relevant-documents-in |
Repo | |
Framework | |
A Deep Learning Based 6 Degree-of-Freedom Localization Method for Endoscopic Capsule Robots
Title | A Deep Learning Based 6 Degree-of-Freedom Localization Method for Endoscopic Capsule Robots |
Authors | Mehmet Turan, Yasin Almalioglu, Ender Konukoglu, Metin Sitti |
Abstract | We present a robust deep learning based 6 degrees-of-freedom (DoF) localization system for endoscopic capsule robots. Our system mainly focuses on localization of endoscopic capsule robots inside the GI tract using only visual information captured by a mono camera integrated to the robot. The proposed system is a 23-layer deep convolutional neural network (CNN) that is capable to estimate the pose of the robot in real time using a standard CPU. The dataset for the evaluation of the system was recorded inside a surgical human stomach model with realistic surface texture, softness, and surface liquid properties so that the pre-trained CNN architecture can be transferred confidently into a real endoscopic scenario. An average error of 7:1% and 3:4% for translation and rotation has been obtained, respectively. The results accomplished from the experiments demonstrate that a CNN pre-trained with raw 2D endoscopic images performs accurately inside the GI tract and is robust to various challenges posed by reflection distortions, lens imperfections, vignetting, noise, motion blur, low resolution, and lack of unique landmarks to track. |
Tasks | |
Published | 2017-05-15 |
URL | http://arxiv.org/abs/1705.05435v1 |
http://arxiv.org/pdf/1705.05435v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-based-6-degree-of-freedom |
Repo | |
Framework | |