October 18, 2019

3166 words 15 mins read

Paper Group ANR 664

Paper Group ANR 664

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames. Visual Data Synthesis via GAN for Zero-Shot Video Classification. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment. signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization. Ensemble computation a …

I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames

Title I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames
Authors Shweta Bhardwaj, Mitesh M. Khapra
Abstract Over the past few years, various tasks involving videos such as classification, description, summarization and question answering have received a lot of attention. Current models for these tasks compute an encoding of the video by treating it as a sequence of images and going over every image in the sequence. However, for longer videos this is very time consuming. In this paper, we focus on the task of video classification and aim to reduce the computational time by using the idea of distillation. Specifically, we first train a teacher network which looks at all the frames in a video and computes a representation for the video. We then train a student network whose objective is to process only a small fraction of the frames in the video and still produce a representation which is very close to the representation computed by the teacher network. This smaller student network involving fewer computations can then be employed at inference time for video classification. We experiment with the YouTube-8M dataset and show that the proposed student network can reduce the inference time by upto 30% with a very small drop in the performance
Tasks Question Answering, Video Classification
Published 2018-05-12
URL http://arxiv.org/abs/1805.04668v1
PDF http://arxiv.org/pdf/1805.04668v1.pdf
PWC https://paperswithcode.com/paper/i-have-seen-enough-a-teacher-student-network
Repo
Framework

Visual Data Synthesis via GAN for Zero-Shot Video Classification

Title Visual Data Synthesis via GAN for Zero-Shot Video Classification
Authors Chenrui Zhang, Yuxin Peng
Abstract Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seen-to-unseen correlation via learning a projection between visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by “heterogeneity gap”. In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly.
Tasks Video Classification, Zero-Shot Learning
Published 2018-04-26
URL http://arxiv.org/abs/1804.10073v1
PDF http://arxiv.org/pdf/1804.10073v1.pdf
PWC https://paperswithcode.com/paper/visual-data-synthesis-via-gan-for-zero-shot
Repo
Framework

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

Title Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment
Authors Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic
Abstract Multimodal affective computing, learning to recognize and interpret human affects and subjective information from multiple data sources, is still challenging because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract level, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utter-ance-level sentiment and emotion from text and audio data. Our introduced model outperforms the state-of-the-art approaches on published datasets and we demonstrated that our model is able to visualize and interpret the synchronized attention over modalities.
Tasks
Published 2018-05-22
URL http://arxiv.org/abs/1805.08660v1
PDF http://arxiv.org/pdf/1805.08660v1.pdf
PWC https://paperswithcode.com/paper/multimodal-affective-analysis-using
Repo
Framework

signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization

Title signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization
Authors Xiaojian Xu, Ulugbek S. Kamilov
Abstract Stochastic gradient descent (SGD) is one of the most widely used optimization methods for parallel and distributed processing of large datasets. One of the key limitations of distributed SGD is the need to regularly communicate the gradients between different computation nodes. To reduce this communication bottleneck, recent work has considered a one-bit variant of SGD, where only the sign of each gradient element is used in optimization. In this paper, we extend this idea by proposing a stochastic variant of the proximal-gradient method that also uses one-bit per update element. We prove the theoretical convergence of the method for non-convex optimization under a set of explicit assumptions. Our results indicate that the compressed method can match the convergence rate of the uncompressed one, making the proposed method potentially appealing for distributed processing of large datasets.
Tasks Stochastic Optimization
Published 2018-07-20
URL http://arxiv.org/abs/1807.08023v2
PDF http://arxiv.org/pdf/1807.08023v2.pdf
PWC https://paperswithcode.com/paper/signprox-one-bit-proximal-algorithm-for
Repo
Framework

Ensemble computation approach to the Hough transform

Title Ensemble computation approach to the Hough transform
Authors Timur M. Khanipov
Abstract It is demonstrated that the classical Hough transform with shift-elevation parametrization of digital straight lines has additive complexity of at most $\mathcal{O}(n^3 / \log n)$ on a $n\times n$ image. The proof is constructive and uses ensemble computation approach to build summation circuits. The proposed method has similarities with the fast Hough transform (FHT) and may be considered a form of the “divide and conquer” technique. It is based on the fact that lines with close slopes can be decomposed into common components, allowing generalization for other pattern families. When applied to FHT patterns, the algorithm yields exactly the $\Theta(n^2\log n)$ FHT asymptotics which might suggest that the actual classical Hough transform circuits could smaller size than $\Theta(n^3/ \log n)$.
Tasks
Published 2018-02-19
URL http://arxiv.org/abs/1802.06619v1
PDF http://arxiv.org/pdf/1802.06619v1.pdf
PWC https://paperswithcode.com/paper/ensemble-computation-approach-to-the-hough
Repo
Framework

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Title Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification
Authors Chenrui Zhang, Yuxin Peng
Abstract Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacher-student framework can reduce the redundancy of knowledge learnt from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference.
Tasks Representation Learning, Transfer Learning, Video Classification
Published 2018-04-26
URL http://arxiv.org/abs/1804.10069v1
PDF http://arxiv.org/pdf/1804.10069v1.pdf
PWC https://paperswithcode.com/paper/better-and-faster-knowledge-transfer-from
Repo
Framework

Dropout as a Structured Shrinkage Prior

Title Dropout as a Structured Shrinkage Prior
Authors Eric Nalisnick, José Miguel Hernández-Lobato, Padhraic Smyth
Abstract Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of “co-adapted” weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network’s weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout’s Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior ‘automatic depth determination’ as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.
Tasks Bayesian Inference
Published 2018-10-09
URL https://arxiv.org/abs/1810.04045v3
PDF https://arxiv.org/pdf/1810.04045v3.pdf
PWC https://paperswithcode.com/paper/dropout-as-a-structured-shrinkage-prior
Repo
Framework

Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks

Title Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks
Authors Yasar Sinan Nasir, Dongning Guo
Abstract This work demonstrates the potential of deep reinforcement learning techniques for transmit power control in wireless networks. Existing techniques typically find near-optimal power allocations by solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. In this paper, a distributively executed dynamic power allocation scheme is developed based on model-free deep reinforcement learning. Each transmitter collects CSI and quality of service (QoS) information from several neighbors and adapts its own transmit power accordingly. The objective is to maximize a weighted sum-rate utility function, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling. Both random variations and delays in the CSI are inherently addressed using deep Q-learning. For a typical network architecture, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. The proposed scheme is especially suitable for practical scenarios where the system model is inaccurate and CSI delay is non-negligible.
Tasks Q-Learning
Published 2018-08-01
URL http://arxiv.org/abs/1808.00490v3
PDF http://arxiv.org/pdf/1808.00490v3.pdf
PWC https://paperswithcode.com/paper/multi-agent-deep-reinforcement-learning-for-1
Repo
Framework

Joint Entity Extraction and Assertion Detection for Clinical Text

Title Joint Entity Extraction and Assertion Detection for Clinical Text
Authors Parminder Bhatia, Busra Celikkaya, Mohammed Khalilia
Abstract Negative medical findings are prevalent in clinical reports, yet discriminating them from positive findings remains a challenging task for information extraction. Most of the existing systems treat this task as a pipeline of two separate tasks, i.e., named entity recognition (NER) and rule-based negation detection. We consider this as a multi-task problem and present a novel end-to-end neural model to jointly extract entities and negations. We extend a standard hierarchical encoder-decoder NER model and first adopt a shared encoder followed by separate decoders for the two tasks. This architecture performs considerably better than the previous rule-based and machine learning-based systems. To overcome the problem of increased parameter size especially for low-resource settings, we propose the Conditional Softmax Shared Decoder architecture which achieves state-of-art results for NER and negation detection on the 2010 i2b2/VA challenge dataset and a proprietary de-identified clinical dataset.
Tasks Entity Extraction, Named Entity Recognition, Negation Detection
Published 2018-12-13
URL https://arxiv.org/abs/1812.05270v5
PDF https://arxiv.org/pdf/1812.05270v5.pdf
PWC https://paperswithcode.com/paper/end-to-end-joint-entity-extraction-and
Repo
Framework

Terrain RL Simulator

Title Terrain RL Simulator
Authors Glen Berseth, Xue Bin Peng, Michiel van de Panne
Abstract We provide $89$ challenging simulation environments that range in difficulty. The difficulty of solving a task is linked not only to the number of dimensions in the action space but also to the size and shape of the distribution of configurations the agent experiences. Therefore, we are releasing a number of simulation environments that include randomly generated terrain. The library also provides simple mechanisms to create new environments with different agent morphologies and the option to modify the distribution of generated terrain. We believe using these and other more complex simulations will help push the field closer to creating human-level intelligence.
Tasks
Published 2018-04-17
URL http://arxiv.org/abs/1804.06424v1
PDF http://arxiv.org/pdf/1804.06424v1.pdf
PWC https://paperswithcode.com/paper/terrain-rl-simulator
Repo
Framework

Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models

Title Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models
Authors Jacob Kauffmann, Klaus-Robert Müller, Grégoire Montavon
Abstract A common machine learning task is to discriminate between normal and anomalous data points. In practice, it is not always sufficient to reach high accuracy at this task, one also would like to understand why a given data point has been predicted in a certain way. We present a new principled approach for one-class SVMs that decomposes outlier predictions in terms of input variables. The method first recomposes the one-class model as a neural network with distance functions and min-pooling, and then performs a deep Taylor decomposition (DTD) of the model output. The proposed One-Class DTD is applicable to a number of common distance-based SVM kernels and is able to reliably explain a wide set of data anomalies. Furthermore, it outperforms baselines such as sensitivity analysis, nearest neighbor, or simple edge detection.
Tasks Edge Detection
Published 2018-05-16
URL http://arxiv.org/abs/1805.06230v1
PDF http://arxiv.org/pdf/1805.06230v1.pdf
PWC https://paperswithcode.com/paper/towards-explaining-anomalies-a-deep-taylor
Repo
Framework

Extraction of V2V Encountering Scenarios from Naturalistic Driving Database

Title Extraction of V2V Encountering Scenarios from Naturalistic Driving Database
Authors Zhaobin Mo, Sisi Li, Diange Yang, Ding Zhao
Abstract It is necessary to thoroughly evaluate the effectiveness and safety of Connected Vehicles (CVs) algorithm before their release and deployment. Current evaluation approach mainly relies on simulation platform with the single-vehicle driving model. The main drawback of it is the lack of network realism. To overcome this problem, we extract naturalistic V2V encounters data from the database, and then separate the primary vehicle encounter category by clustering. A fast mining algorithm is proposed that can be applied to parallel query for further process acceleration. 4,500 encounters are mined from a 275 GB database collected in the Safety Pilot Model Program in Ann Arbor Michigan, USA. K-means and Dynamic Time Warping (DTW) are used in clustering. Results show this method can quickly mine and cluster primary driving scenarios from a large database. Our results separate the car-following, intersection and by-passing, which are the primary category of the vehicle encounter. We anticipate the work in the essay can become a general method to effectively extract vehicle encounters from any existing database that contains vehicular GPS information. What’s more, the naturalistic data of different vehicle encounters can be applied in Connected Vehicles evaluation.
Tasks
Published 2018-02-27
URL http://arxiv.org/abs/1802.09917v2
PDF http://arxiv.org/pdf/1802.09917v2.pdf
PWC https://paperswithcode.com/paper/extraction-of-v2v-encountering-scenarios-from
Repo
Framework

Unravelling Robustness of Deep Learning based Face Recognition Against Adversarial Attacks

Title Unravelling Robustness of Deep Learning based Face Recognition Against Adversarial Attacks
Authors Gaurav Goswami, Nalini Ratha, Akshay Agarwal, Richa Singh, Mayank Vatsa
Abstract Deep neural network (DNN) architecture based models have high expressive power and learning capacity. However, they are essentially a black box method since it is not easy to mathematically formulate the functions that are learned within its many layers of representation. Realizing this, many researchers have started to design methods to exploit the drawbacks of deep learning based algorithms questioning their robustness and exposing their singularities. In this paper, we attempt to unravel three aspects related to the robustness of DNNs for face recognition: (i) assessing the impact of deep architectures for face recognition in terms of vulnerabilities to attacks inspired by commonly observed distortions in the real world that are well handled by shallow learning methods along with learning based adversaries; (ii) detecting the singularities by characterizing abnormal filter response behavior in the hidden layers of deep networks; and (iii) making corrections to the processing pipeline to alleviate the problem. Our experimental evaluation using multiple open-source DNN-based face recognition networks, including OpenFace and VGG-Face, and two publicly available databases (MEDS and PaSC) demonstrates that the performance of deep learning based face recognition algorithms can suffer greatly in the presence of such distortions. The proposed method is also compared with existing detection algorithms and the results show that it is able to detect the attacks with very high accuracy by suitably designing a classifier using the response of the hidden layers in the network. Finally, we present several effective countermeasures to mitigate the impact of adversarial attacks and improve the overall robustness of DNN-based face recognition.
Tasks Face Recognition
Published 2018-02-22
URL http://arxiv.org/abs/1803.00401v1
PDF http://arxiv.org/pdf/1803.00401v1.pdf
PWC https://paperswithcode.com/paper/unravelling-robustness-of-deep-learning-based
Repo
Framework

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

Title ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification
Authors Fangneng Zhan, Shijian Lu
Abstract Automated recognition of texts in scenes has been a research challenge for years, largely due to the arbitrary variation of text appearances in perspective distortion, text line curvature, text styles and different types of imaging artifacts. The recent deep networks are capable of learning robust representations with respect to imaging artifacts and text style changes, but still face various problems while dealing with scene texts with perspective and curvature distortions. This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. An innovative rectification network is developed which employs a novel line-fitting transformation to estimate the pose of text lines in scenes. In addition, an iterative rectification pipeline is developed where scene text distortions are corrected iteratively towards a fronto-parallel view. The ESIR is also robust to parameter initialization and the training needs only scene text images and word-level annotations as required by most scene text recognition systems. Extensive experiments over a number of public datasets show that the proposed ESIR is capable of rectifying scene text distortions accurately, achieving superior recognition performance for both normal scene text images and those suffering from perspective and curvature distortions.
Tasks Scene Text Recognition
Published 2018-12-14
URL http://arxiv.org/abs/1812.05824v3
PDF http://arxiv.org/pdf/1812.05824v3.pdf
PWC https://paperswithcode.com/paper/esir-end-to-end-scene-text-recognition-via
Repo
Framework

A visual approach for age and gender identification on Twitter

Title A visual approach for age and gender identification on Twitter
Authors Miguel A. Alvarez-Carmona, Luis Pellegrin, Manuel Montes-y-Gómez, Fernando Sánchez-Vega, Hugo Jair Escalante, A. Pastor López-Monroy, Luis Villaseñor-Pineda, Esaú Villatoro-Tello
Abstract The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psychology, marketing, but specially in those related with social media exploitation. As known, social media data is shared through a wide range of modalities (e.g., text, images and audio), representing valuable information to be exploited for extracting valuable insights from users. Nevertheless, most of the current work in AP using social media data has been devoted to analyze textual information only, and there are very few works that have started exploring the gender identification using visual information. Contrastingly, this paper focuses in exploiting the visual modality to perform both age and gender identification in social media, specifically in Twitter. Our goal is to evaluate the pertinence of using visual information in solving the AP task. Accordingly, we have extended the Twitter corpus from PAN 2014, incorporating posted images from all the users, making a distinction between tweeted and retweeted images. Performed experiments provide interesting evidence on the usefulness of visual information in comparison with traditional textual representations for the AP task.
Tasks
Published 2018-05-28
URL http://arxiv.org/abs/1805.11166v1
PDF http://arxiv.org/pdf/1805.11166v1.pdf
PWC https://paperswithcode.com/paper/a-visual-approach-for-age-and-gender
Repo
Framework
comments powered by Disqus