Paper Group ANR 553
Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning. A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content. Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling $(1+λ)$ EA Variants on OneMax and LeadingOnes. When will you do what? - Anticip …
Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning
Title | Class Mean Vector Component and Discriminant Analysis for Kernel Subspace Learning |
Authors | Alexandros Iosifidis |
Abstract | The kernel matrix used in kernel methods encodes all the information required for solving complex nonlinear problems defined on data representations in the input space using simple, but implicitly defined, solutions. Spectral analysis on the kernel matrix defines an explicit nonlinear mapping of the input data representations to a subspace of the kernel space, which can be used for directly applying linear methods. However, the selection of the kernel subspace is crucial for the performance of the proceeding processing steps. In this paper, we propose a component analysis method for kernel-based dimensionality reduction that optimally preserves the pair-wise distances of the class means in the feature space. We provide extensive analysis on the connection of the proposed criterion to those used in kernel principal component analysis and kernel discriminant analysis, leading to a discriminant analysis version of the proposed method. Our analysis also provides more insights on the properties of the feature spaces obtained by applying these methods. |
Tasks | Dimensionality Reduction |
Published | 2018-12-14 |
URL | http://arxiv.org/abs/1812.05988v2 |
http://arxiv.org/pdf/1812.05988v2.pdf | |
PWC | https://paperswithcode.com/paper/class-mean-vector-component-and-discriminant |
Repo | |
Framework | |
A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content
Title | A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content |
Authors | Joshua Saxe, Richard Harang, Cody Wild, Hillary Sanders |
Abstract | Malicious web content is a serious problem on the Internet today. In this paper we propose a deep learning approach to detecting malevolent web pages. While past work on web content detection has relied on syntactic parsing or on emulation of HTML and Javascript to extract features, our approach operates directly on a language-agnostic stream of tokens extracted directly from static HTML files with a simple regular expression. This makes it fast enough to operate in high-frequency data contexts like firewalls and web proxies, and allows it to avoid the attack surface exposure of complex parsing and emulation code. Unlike well-known approaches such as bag-of-words models, which ignore spatial information, our neural network examines content at hierarchical spatial scales, allowing our model to capture locality and yielding superior accuracy compared to bag-of-words baselines. Our proposed architecture achieves a 97.5% detection rate at a 0.1% false positive rate, and classifies small-batched web pages at a rate of over 100 per second on commodity hardware. The speed and accuracy of our approach makes it appropriate for deployment to endpoints, firewalls, and web proxies. |
Tasks | |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.05020v1 |
http://arxiv.org/pdf/1804.05020v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-approach-to-fast-format |
Repo | |
Framework | |
Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling $(1+λ)$ EA Variants on OneMax and LeadingOnes
Title | Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling $(1+λ)$ EA Variants on OneMax and LeadingOnes |
Authors | Carola Doerr, Furong Ye, Sander van Rijn, Hao Wang, Thomas Bäck |
Abstract | Theoretical and empirical research on evolutionary computation methods complement each other by providing two fundamentally different approaches towards a better understanding of black-box optimization heuristics. In discrete optimization, both streams developed rather independently of each other, but we observe today an increasing interest in reconciling these two sub-branches. In continuous optimization, the COCO (COmparing Continuous Optimisers) benchmarking suite has established itself as an important platform that theoreticians and practitioners use to exchange research ideas and questions. No widely accepted equivalent exists in the research domain of discrete black-box optimization. Marking an important step towards filling this gap, we adjust the COCO software to pseudo-Boolean optimization problems, and obtain from this a benchmarking environment that allows a fine-grained empirical analysis of discrete black-box heuristics. In this documentation we demonstrate how this test bed can be used to profile the performance of evolutionary algorithms. More concretely, we study the optimization behavior of several $(1+\lambda)$ EA variants on the two benchmark problems OneMax and LeadingOnes. This comparison motivates a refined analysis for the optimization time of the $(1+\lambda)$ EA on LeadingOnes. |
Tasks | |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05850v1 |
http://arxiv.org/pdf/1808.05850v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-theory-guided-benchmarking-suite |
Repo | |
Framework | |
When will you do what? - Anticipating Temporal Occurrences of Activities
Title | When will you do what? - Anticipating Temporal Occurrences of Activities |
Authors | Yazan Abu Farha, Alexander Richard, Juergen Gall |
Abstract | Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amount of future actions and their durations. Both, a CNN and an RNN are trained to learn future video labels based on previously seen content. We show that our methods generate accurate predictions of the future even for long videos with a huge amount of different actions and can even deal with noisy or erroneous input information. |
Tasks | |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00892v1 |
http://arxiv.org/pdf/1804.00892v1.pdf | |
PWC | https://paperswithcode.com/paper/when-will-you-do-what-anticipating-temporal |
Repo | |
Framework | |
Verification of deep probabilistic models
Title | Verification of deep probabilistic models |
Authors | Krishnamurthy Dvijotham, Marta Garnelo, Alhussein Fawzi, Pushmeet Kohli |
Abstract | Probabilistic models are a critical part of the modern deep learning toolbox - ranging from generative models (VAEs, GANs), sequence to sequence models used in machine translation and speech processing to models over functional spaces (conditional neural processes, neural processes). Given the size and complexity of these models, safely deploying them in applications requires the development of tools to analyze their behavior rigorously and provide some guarantees that these models are consistent with a list of desirable properties or specifications. For example, a machine translation model should produce semantically equivalent outputs for innocuous changes in the input to the model. A functional regression model that is learning a distribution over monotonic functions should predict a larger value at a larger input. Verification of these properties requires a new framework that goes beyond notions of verification studied in deterministic feedforward networks, since requiring worst-case guarantees in probabilistic models is likely to produce conservative or vacuous results. We propose a novel formulation of verification for deep probabilistic models that take in conditioning inputs and sample latent variables in the course of producing an output: We require that the output of the model satisfies a linear constraint with high probability over the sampling of latent variables and for every choice of conditioning input to the model. We show that rigorous lower bounds on the probability that the constraint is satisfied can be obtained efficiently. Experiments with neural processes show that several properties of interest while modeling functional spaces can be modeled within this framework (monotonicity, convexity) and verified efficiently using our algorithms |
Tasks | Machine Translation |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02795v1 |
http://arxiv.org/pdf/1812.02795v1.pdf | |
PWC | https://paperswithcode.com/paper/verification-of-deep-probabilistic-models |
Repo | |
Framework | |
A Basic Compositional Model for Spiking Neural Networks
Title | A Basic Compositional Model for Spiking Neural Networks |
Authors | Nancy Lynch, Cameron Musco |
Abstract | This paper is part of a project on developing an algorithmic theory of brain networks, based on stochastic Spiking Neural Network (SNN) models. Inspired by tasks that seem to be solved in actual brains, we are defining abstract problems to be solved by these networks. In our work so far, we have developed models and algorithms for the Winner-Take-All problem from computational neuroscience [LMP17a,Mus18], and problems of similarity detection and neural coding [LMP17b]. We plan to consider many other problems and networks, including both static networks and networks that learn. This paper is about basic theory for the stochastic SNN model. In particular, we define a simple version of the model. This version assumes that the neurons’ only state is a Boolean, indicating whether the neuron is firing or not. In later work, we plan to develop variants of the model with more elaborate state. We also define an external behavior notion for SNNs, which can be used for stating requirements to be satisfied by the networks. We then define a composition operator for SNNs. We prove that our external behavior notion is “compositional”, in the sense that the external behavior of a composed network depends only on the external behaviors of the component networks. We also define a hiding operator that reclassifies some output behavior of an SNN as internal. We give basic results for hiding. Finally, we give a formal definition of a problem to be solved by an SNN, and give basic results showing how composition and hiding of networks affect the problems that they solve. We illustrate our definitions with three examples: building a circuit out of gates, building an “Attention” network out of a “Winner-Take-All” network and a “Filter” network, and a toy example involving combining two networks in a cyclic fashion. |
Tasks | |
Published | 2018-08-12 |
URL | http://arxiv.org/abs/1808.03884v1 |
http://arxiv.org/pdf/1808.03884v1.pdf | |
PWC | https://paperswithcode.com/paper/a-basic-compositional-model-for-spiking |
Repo | |
Framework | |
Twitter User Geolocation using Deep Multiview Learning
Title | Twitter User Geolocation using Deep Multiview Learning |
Authors | Tien Huu Do, Duc Minh Nguyen, Evaggelia Tsiligianni, Bruno Cornelis, Nikos Deligiannis |
Abstract | Predicting the geographical location of users on social networks like Twitter is an active research topic with plenty of methods proposed so far. Most of the existing work follows either a content-based or a network-based approach. The former is based on user-generated content while the latter exploits the structure of the network of users. In this paper, we propose a more generic approach, which incorporates not only both content-based and network-based features, but also other available information into a unified model. Our approach, named Multi-Entry Neural Network (MENET), leverages the latest advances in deep learning and multiview learning. A realization of MENET with textual, network and metadata features results in an effective method for Twitter user geolocation, achieving the state of the art on two well-known datasets. |
Tasks | Multiview Learning |
Published | 2018-05-11 |
URL | http://arxiv.org/abs/1805.04612v1 |
http://arxiv.org/pdf/1805.04612v1.pdf | |
PWC | https://paperswithcode.com/paper/twitter-user-geolocation-using-deep-multiview |
Repo | |
Framework | |
Deep Learning for Singing Processing: Achievements, Challenges and Impact on Singers and Listeners
Title | Deep Learning for Singing Processing: Achievements, Challenges and Impact on Singers and Listeners |
Authors | Emilia Gómez, Merlijn Blaauw, Jordi Bonada, Pritish Chandna, Helena Cuesta |
Abstract | This paper summarizes some recent advances on a set of tasks related to the processing of singing using state-of-the-art deep learning techniques. We discuss their achievements in terms of accuracy and sound quality, and the current challenges, such as availability of data and computing resources. We also discuss the impact that these advances do and will have on listeners and singers when they are integrated in commercial applications. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03046v1 |
http://arxiv.org/pdf/1807.03046v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-singing-processing |
Repo | |
Framework | |
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Title | Enhanced-alignment Measure for Binary Foreground Map Evaluation |
Authors | Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji |
Abstract | The existing binary foreground map (FM) measures to address various types of errors in either pixel-wise or structural ways. These measures consider pixel-level match or image-level information independently, while cognitive vision studies have shown that human vision is highly sensitive to both global information and local details in scenes. In this paper, we take a detailed look at current binary FM evaluation measures and propose a novel and effective E-measure (Enhanced-alignment measure). Our measure combines local pixel values with the image-level mean value in one term, jointly capturing image-level statistics and local pixel matching information. We demonstrate the superiority of our measure over the available measures on 4 popular datasets via 5 meta-measures, including ranking models for applications, demoting generic, random Gaussian noise maps, ground-truth switch, as well as human judgments. We find large improvements in almost all the meta-measures. For instance, in terms of application ranking, we observe improvementrangingfrom9.08% to 19.65% compared with other popular measures. |
Tasks | |
Published | 2018-05-26 |
URL | http://arxiv.org/abs/1805.10421v2 |
http://arxiv.org/pdf/1805.10421v2.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-alignment-measure-for-binary |
Repo | |
Framework | |
Perceptual Context in Cognitive Hierarchies
Title | Perceptual Context in Cognitive Hierarchies |
Authors | Bernhard Hengst, Maurice Pagnucco, David Rajaratnam, Claude Sammut, Michael Thielscher |
Abstract | Cognition does not only depend on bottom-up sensor feature abstraction, but also relies on contextual information being passed top-down. Context is higher level information that helps to predict belief states at lower levels. The main contribution of this paper is to provide a formalisation of perceptual context and its integration into a new process model for cognitive hierarchies. Several simple instantiations of a cognitive hierarchy are used to illustrate the role of context. Notably, we demonstrate the use context in a novel approach to visually track the pose of rigid objects with just a 2D camera. |
Tasks | |
Published | 2018-01-07 |
URL | http://arxiv.org/abs/1801.02270v1 |
http://arxiv.org/pdf/1801.02270v1.pdf | |
PWC | https://paperswithcode.com/paper/perceptual-context-in-cognitive-hierarchies |
Repo | |
Framework | |
Water from Two Rocks: Maximizing the Mutual Information
Title | Water from Two Rocks: Maximizing the Mutual Information |
Authors | Yuqing Kong, Grant Schoenebeck |
Abstract | We build a natural connection between the learning problem, co-training, and forecast elicitation without verification (related to peer-prediction) and address them simultaneously using the same information theoretic approach. In co-training/multiview learning, the goal is to aggregate two views of data into a prediction for a latent label. We show how to optimally combine two views of data by reducing the problem to an optimization problem. Our work gives a unified and rigorous approach to the general setting. In forecast elicitation without verification we seek to design a mechanism that elicits high quality forecasts from agents in the setting where the mechanism does not have access to the ground truth. By assuming the agents’ information is independent conditioning on the outcome, we propose mechanisms where truth-telling is a strict equilibrium for both the single-task and multi-task settings. Our multi-task mechanism additionally has the property that the truth-telling equilibrium pays better than any other strategy profile and strictly better than any other “non-permutation” strategy profile when the prior satisfies some mild conditions. |
Tasks | Multiview Learning |
Published | 2018-02-24 |
URL | http://arxiv.org/abs/1802.08887v3 |
http://arxiv.org/pdf/1802.08887v3.pdf | |
PWC | https://paperswithcode.com/paper/water-from-two-rocks-maximizing-the-mutual |
Repo | |
Framework | |
Predicting Action Tubes
Title | Predicting Action Tubes |
Authors | Gurkirt Singh, Suman Saha, Fabio Cuzzolin |
Abstract | In this work, we present a method to predict an entire `action tube’ (a set of temporally linked bounding boxes) in a trimmed video just by observing a smaller subset of it. Predicting where an action is going to take place in the near future is essential to many computer vision based applications such as autonomous driving or surgical robotics. Importantly, it has to be done in real-time and in an online fashion. We propose a Tube Prediction network (TPnet) which jointly predicts the past, present and future bounding boxes along with their action classification scores. At test time TPnet is used in a (temporal) sliding window setting, and its predictions are put into a tube estimation framework to construct/predict the video long action tubes not only for the observed part of the video but also for the unobserved part. Additionally, the proposed action tube predictor helps in completing action tubes for unobserved segments of the video. We quantitatively demonstrate the latter ability, and the fact that TPnet improves state-of-the-art detection performance, on one of the standard action detection benchmarks - J-HMDB-21 dataset. | |
Tasks | Action Classification, Action Detection, Autonomous Driving |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07712v1 |
http://arxiv.org/pdf/1808.07712v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-action-tubes |
Repo | |
Framework | |
Action Detection from a Robot-Car Perspective
Title | Action Detection from a Robot-Car Perspective |
Authors | Valentina Fontana, Gurkirt Singh, Stephen Akrigg, Manuele Di Maio, Suman Saha, Fabio Cuzzolin |
Abstract | We present the new Road Event and Activity Detection (READ) dataset, designed and created from an autonomous vehicle perspective to take action detection challenges to autonomous driving. READ will give scholars in computer vision, smart cars and machine learning at large the opportunity to conduct research into exciting new problems such as understanding complex (road) activities, discerning the behaviour of sentient agents, and predicting both the label and the location of future actions and events, with the final goal of supporting autonomous decision making. |
Tasks | Action Detection, Activity Detection, Autonomous Driving, Decision Making |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11332v1 |
http://arxiv.org/pdf/1807.11332v1.pdf | |
PWC | https://paperswithcode.com/paper/action-detection-from-a-robot-car-perspective |
Repo | |
Framework | |
Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images
Title | Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images |
Authors | Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua |
Abstract | In recent years, both online retail and video hosting service are exponentially growing. In this paper, we explore a new cross-domain task, Video2Shop, targeting for matching clothes appeared in videos to the exact same items in online shops. A novel deep neural network, called AsymNet, is proposed to explore this problem. For the image side, well-established methods are used to detect and extract features for clothing patches with arbitrary sizes. For the video side, deep visual features are extracted from detected object regions in each frame, and further fed into a Long Short-Term Memory (LSTM) framework for sequence modeling, which captures the temporal dynamics in videos. To conduct exact matching between videos and online shopping images, LSTM hidden states, representing the video, and image features, which represent static object images, are jointly modeled under the similarity network with reconfigurable deep tree structure. Moreover, an approximate training method is proposed to achieve the efficiency when training. Extensive experiments conducted on a large cross-domain dataset have demonstrated the effectiveness and efficiency of the proposed AsymNet, which outperforms the state-of-the-art methods. |
Tasks | |
Published | 2018-04-14 |
URL | http://arxiv.org/abs/1804.05287v2 |
http://arxiv.org/pdf/1804.05287v2.pdf | |
PWC | https://paperswithcode.com/paper/video2shop-exact-matching-clothes-in-videos |
Repo | |
Framework | |
Occluded Joints Recovery in 3D Human Pose Estimation based on Distance Matrix
Title | Occluded Joints Recovery in 3D Human Pose Estimation based on Distance Matrix |
Authors | Xiang Guo, Yuchao Dai |
Abstract | Albeit the recent progress in single image 3D human pose estimation due to the convolutional neural network, it is still challenging to handle real scenarios such as highly occluded scenes. In this paper, we propose to address the problem of single image 3D human pose estimation with occluded measurements by exploiting the Euclidean distance matrix (EDM). Specifically, we present two approaches based on EDM, which could effectively handle occluded joints in 2D images. The first approach is based on 2D-to-2D distance matrix regression achieved by a simple CNN architecture. The second approach is based on sparse coding along with a learned over-complete dictionary. Experiments on the Human3.6M dataset show the excellent performance of these two approaches in recovering occluded observations and demonstrate the improvements in accuracy for 3D human pose estimation with occluded joints. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11147v1 |
http://arxiv.org/pdf/1807.11147v1.pdf | |
PWC | https://paperswithcode.com/paper/occluded-joints-recovery-in-3d-human-pose |
Repo | |
Framework | |