January 26, 2020

2856 words 14 mins read

Paper Group ANR 1390

Paper Group ANR 1390

Constructing a provably adversarially-robust classifier from a high accuracy one. Source Coding Based mmWave Channel Estimation with Deep Learning Based Decoding. Fighting Quantization Bias With Bias. Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video. Neural Response Generation with Meta-Words. HexagD …

Constructing a provably adversarially-robust classifier from a high accuracy one

Title Constructing a provably adversarially-robust classifier from a high accuracy one
Authors Grzegorz Głuch, Rüdiger Urbanke
Abstract Modern machine learning models with very high accuracy have been shown to be vulnerable to small, adversarially chosen perturbations of the input. Given black-box access to a high-accuracy classifier $f$, we show how to construct a new classifier $g$ that has high accuracy and is also robust to adversarial $\ell_2$-bounded perturbations. Our algorithm builds upon the framework of \textit{randomized smoothing} that has been recently shown to outperform all previous defenses against $\ell_2$-bounded adversaries. Using techniques like random partitions and doubling dimension, we are able to bound the adversarial error of $g$ in terms of the optimum error. In this paper we focus on our conceptual contribution, but we do present two examples to illustrate our framework. We will argue that, under some assumptions, our bounds are optimal for these cases.
Tasks
Published 2019-12-16
URL https://arxiv.org/abs/1912.07561v1
PDF https://arxiv.org/pdf/1912.07561v1.pdf
PWC https://paperswithcode.com/paper/constructing-a-provably-adversarially-robust
Repo
Framework

Source Coding Based mmWave Channel Estimation with Deep Learning Based Decoding

Title Source Coding Based mmWave Channel Estimation with Deep Learning Based Decoding
Authors Yahia Shabara, Eylem Ekici, C. Emre Koksal
Abstract mmWave technology is set to become a main feature of next generation wireless networks, e.g., 5G mobile and WiFi 802.11ad/ay. Among the basic and most fundamental challenges facing mmWave is the ability to overcome its unfavorable propagation characteristics using energy efficient solutions. This has been addressed using innovative transceiver architectures. However, these architectures have their own limitations when it comes to channel estimation. This paper focuses on channel estimation and poses it as a source compression problem, where channel measurements are designed to mimic an encoded (compressed) version of the channel. We show that linear source codes can significantly reduce the number of channel measurements required to discover all channel paths. We also propose a deep-learning-based approach for decoding the obtained measurements, which enables high-speed and efficient channel discovery.
Tasks
Published 2019-04-30
URL http://arxiv.org/abs/1905.00124v1
PDF http://arxiv.org/pdf/1905.00124v1.pdf
PWC https://paperswithcode.com/paper/source-coding-based-mmwave-channel-estimation
Repo
Framework

Fighting Quantization Bias With Bias

Title Fighting Quantization Bias With Bias
Authors Alexander Finkelstein, Uri Almog, Mark Grobman
Abstract Low-precision representation of deep neural networks (DNNs) is critical for efficient deployment of deep learning application on embedded platforms, however, converting the network to low precision degrades its performance. Crucially, networks that are designed for embedded applications usually suffer from increased degradation since they have less redundancy. This is most evident for the ubiquitous MobileNet architecture which requires a costly quantization-aware training cycle to achieve acceptable performance when quantized to 8-bits. In this paper, we trace the source of the degradation in MobileNets to a shift in the mean activation value. This shift is caused by an inherent bias in the quantization process which builds up across layers, shifting all network statistics away from the learned distribution. We show that this phenomenon happens in other architectures as well. We propose a simple remedy - compensating for the quantization induced shift by adding a constant to the additive bias term of each channel. We develop two simple methods for estimating the correction constants - one using iterative evaluation of the quantized network and one where the constants are set using a short training phase. Both methods are fast and require only a small amount of unlabeled data, making them appealing for rapid deployment of neural networks. Using the above methods we are able to match the performance of training-based quantization of MobileNets at a fraction of the cost.
Tasks Quantization
Published 2019-06-07
URL https://arxiv.org/abs/1906.03193v1
PDF https://arxiv.org/pdf/1906.03193v1.pdf
PWC https://paperswithcode.com/paper/fighting-quantization-bias-with-bias
Repo
Framework

Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video

Title Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video
Authors Behnaz Rezaei, Yiorgos Christakis, Bryan Ho, Kevin Thomas, Kelley Erb, Sarah Ostadabbas, Shyamal Patel
Abstract Objective monitoring and assessment of human motor behavior can improve the diagnosis and management of several medical conditions. Over the past decade, significant advances have been made in the use of wearable technology for continuously monitoring human motor behavior in free-living conditions. However, wearable technology remains ill-suited for applications which require monitoring and interpretation of complex motor behaviors (e.g. involving interactions with the environment). Recent advances in computer vision and deep learning have opened up new possibilities for extracting information from video recordings. In this paper, we present a hierarchical vision-based behavior phenotyping method for classification of basic human actions in video recordings performed using a single RGB camera. Our method addresses challenges associated with tracking multiple human actors and classification of actions in videos recorded in changing environments with different fields of view. We implement a cascaded pose tracker that uses temporal relationships between detections for short-term tracking and appearance-based tracklet fusion for long-term tracking. Furthermore, for action classification, we use pose evolution maps derived from the cascaded pose tracker as low-dimensional and interpretable representations of the movement sequences for training a convolutional neural network. The cascaded pose tracker achieves an average accuracy of 88% in tracking the target human actor in our video recordings, and overall system achieves average test accuracy of 84% for target-specific action classification in untrimmed video recordings.
Tasks Action Classification, Action Recognition In Videos
Published 2019-09-20
URL https://arxiv.org/abs/1909.09566v1
PDF https://arxiv.org/pdf/1909.09566v1.pdf
PWC https://paperswithcode.com/paper/target-specific-action-classification-for
Repo
Framework

Neural Response Generation with Meta-Words

Title Neural Response Generation with Meta-Words
Authors Can Xu, Wei Wu, Chongyang Tao, Huang Hu, Matt Schuerman, Ying Wang
Abstract We present open domain response generation with meta-words. A meta-word is a structured record that describes various attributes of a response, and thus allows us to explicitly model the one-to-many relationship within open domain dialogues and perform response generation in an explainable and controllable manner. To incorporate meta-words into generation, we enhance the sequence-to-sequence architecture with a goal tracking memory network that formalizes meta-word expression as a goal and manages the generation process to achieve the goal with a state memory panel and a state controller. Experimental results on two large-scale datasets indicate that our model can significantly outperform several state-of-the-art generation models in terms of response relevance, response diversity, accuracy of one-to-many modeling, accuracy of meta-word expression, and human evaluation.
Tasks
Published 2019-06-14
URL https://arxiv.org/abs/1906.06050v1
PDF https://arxiv.org/pdf/1906.06050v1.pdf
PWC https://paperswithcode.com/paper/neural-response-generation-with-meta-words
Repo
Framework

HexagDLy - Processing hexagonally sampled data with CNNs in PyTorch

Title HexagDLy - Processing hexagonally sampled data with CNNs in PyTorch
Authors Constantin Steppa, Tim Lukas Holch
Abstract HexagDLy is a Python-library extending the PyTorch deep learning framework with convolution and pooling operations on hexagonal grids. It aims to ease the access to convolutional neural networks for applications that rely on hexagonally sampled data as, for example, commonly found in ground-based astroparticle physics experiments.
Tasks
Published 2019-03-05
URL http://arxiv.org/abs/1903.01814v1
PDF http://arxiv.org/pdf/1903.01814v1.pdf
PWC https://paperswithcode.com/paper/hexagdly-processing-hexagonally-sampled-data
Repo
Framework

Fine-grained Action Segmentation using the Semi-Supervised Action GAN

Title Fine-grained Action Segmentation using the Semi-Supervised Action GAN
Authors Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract In this paper we address the problem of continuous fine-grained action segmentation, in which multiple actions are present in an unsegmented video stream. The challenge for this task lies in the need to represent the hierarchical nature of the actions and to detect the transitions between actions, allowing us to localise the actions within the video effectively. We propose a novel recurrent semi-supervised Generative Adversarial Network (GAN) model for continuous fine-grained human action segmentation. Temporal context information is captured via a novel Gated Context Extractor (GCE) module, composed of gated attention units, that directs the queued context information through the generator model, for enhanced action segmentation. The GAN is made to learn features in a semi-supervised manner, enabling the model to perform action classification jointly with the standard, unsupervised, GAN learning procedure. We perform extensive evaluations on different architectural variants to demonstrate the importance of the proposed network architecture, and show that it is capable of outperforming current state-of-the-art on three challenging datasets: 50 Salads, MERL Shopping and Georgia Tech Egocentric Activities dataset.
Tasks Action Classification, action segmentation
Published 2019-09-20
URL https://arxiv.org/abs/1909.09269v1
PDF https://arxiv.org/pdf/1909.09269v1.pdf
PWC https://paperswithcode.com/paper/fine-grained-action-segmentation-using-the
Repo
Framework

Deep Concept-wise Temporal Convolutional Networks for Action Localization

Title Deep Concept-wise Temporal Convolutional Networks for Action Localization
Authors Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen
Abstract Existing action localization approaches adopt shallow temporal convolutional networks (\ie, TCN) on 1D feature map extracted from video frames. In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution. To address this issue, we introduce a novel concept-wise temporal convolution (CTC) layer as an alternative to conventional temporal convolution layer for training deeper action localization networks. Instead of recombining latent concepts, CTC layer deploys a number of temporal filters to each concept separately with shared filter parameters across concepts. Thus can capture common temporal patterns of different concepts and significantly enrich representation ability. Via stacking CTC layers, we proposed a deep concept-wise temporal convolutional network (C-TCN), which boosts the state-of-the-art action localization performance on THUMOS’14 from 42.8 to 52.1 in terms of mAP(%), achieving a relative improvement of 21.7%. Favorable result is also obtained on ActivityNet.
Tasks Action Classification, Action Localization
Published 2019-08-26
URL https://arxiv.org/abs/1908.09442v1
PDF https://arxiv.org/pdf/1908.09442v1.pdf
PWC https://paperswithcode.com/paper/deep-concept-wise-temporal-convolutional
Repo
Framework

Delving into 3D Action Anticipation from Streaming Videos

Title Delving into 3D Action Anticipation from Streaming Videos
Authors Hongsong Wang, Jiashi Feng
Abstract Action anticipation, which aims to recognize the action with a partial observation, becomes increasingly popular due to a wide range of applications. In this paper, we investigate the problem of 3D action anticipation from streaming videos with the target of understanding best practices for solving this problem. We first introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification. To achieve better performance, we then investigate two important factors, i.e., the length of the training clip and clip sampling method. We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label. Our comprehensive experiments uncover the best practices for 3D action anticipation, and accordingly we propose a novel method with a multi-task loss. The proposed method considerably outperforms the recent methods and exhibits the state-of-the-art performance on standard benchmarks.
Tasks Action Classification, Multi-Task Learning
Published 2019-06-15
URL https://arxiv.org/abs/1906.06521v1
PDF https://arxiv.org/pdf/1906.06521v1.pdf
PWC https://paperswithcode.com/paper/delving-into-3d-action-anticipation-from
Repo
Framework

Deep Learning via Dynamical Systems: An Approximation Perspective

Title Deep Learning via Dynamical Systems: An Approximation Perspective
Authors Qianxiao Li, Ting Lin, Zuowei Shen
Abstract We build on the dynamical systems approach to deep learning, where deep residual networks are idealized as continuous-time dynamical systems. Although theoretical foundations have been developed on the optimization side through mean-field optimal control theory, the function approximation properties of such models remain largely unexplored, especially when the dynamical systems are controlled by functions of low complexity. In this paper, we establish some basic results on the approximation capabilities of deep learning models in the form of dynamical systems. In particular, we derive general sufficient conditions for universal approximation of functions in $L^p$ using flow maps of dynamical systems, and we also deduce some results on their approximation rates for specific cases. Overall, these results reveal that composition function approximation through flow maps present a new paradigm in approximation theory and contributes to building a useful mathematical framework to investigate deep learning.
Tasks
Published 2019-12-22
URL https://arxiv.org/abs/1912.10382v1
PDF https://arxiv.org/pdf/1912.10382v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-via-dynamical-systems-an
Repo
Framework

How Machine (Deep) Learning Helps Us Understand Human Learning: the Value of Big Ideas

Title How Machine (Deep) Learning Helps Us Understand Human Learning: the Value of Big Ideas
Authors Marc Maliar
Abstract I use simulation of two multilayer neural networks to gain intuition into the determinants of human learning. The first network, the teacher, is trained to achieve a high accuracy in handwritten digit recognition. The second network, the student, learns to reproduce the output of the first network. I show that learning from the teacher is more effective than learning from the data under the appropriate degree of regularization. Regularization allows the teacher to distinguish the trends and to deliver “big ideas” to the student. I also model other learning situations such as expert and novice teachers, high- and low-ability students and biased learning experience due to, e.g., poverty and trauma. The results from computer simulation accord remarkably well with finding of the modern psychological literature. The code is written in MATLAB and will be publicly available from the author’s web page.
Tasks Handwritten Digit Recognition
Published 2019-02-16
URL http://arxiv.org/abs/1903.03408v2
PDF http://arxiv.org/pdf/1903.03408v2.pdf
PWC https://paperswithcode.com/paper/how-machine-deep-learning-helps-us-understand
Repo
Framework

Temporal Factorization of 3D Convolutional Kernels

Title Temporal Factorization of 3D Convolutional Kernels
Authors Gabriëlle Ras, Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven
Abstract 3D convolutional neural networks are difficult to train because they are parameter-expensive and data-hungry. To solve these problems we propose a simple technique for learning 3D convolutional kernels efficiently requiring less training data. We achieve this by factorizing the 3D kernel along the temporal dimension, reducing the number of parameters and making training from data more efficient. Additionally we introduce a novel dataset called Video-MNIST to demonstrate the performance of our method. Our method significantly outperforms the conventional 3D convolution in the low data regime (1 to 5 videos per class). Finally, our model achieves competitive results in the high data regime (>10 videos per class) using up to 45% fewer parameters.
Tasks
Published 2019-12-09
URL https://arxiv.org/abs/1912.04075v1
PDF https://arxiv.org/pdf/1912.04075v1.pdf
PWC https://paperswithcode.com/paper/temporal-factorization-of-3d-convolutional
Repo
Framework

Scene Text Magnifier

Title Scene Text Magnifier
Authors Toshiki Nakamura, Anna Zhu, Seiichi Uchida
Abstract Scene text magnifier aims to magnify text in natural scene images without recognition. It could help the special groups, who have myopia or dyslexia to better understand the scene. In this paper, we design the scene text magnifier through interacted four CNN-based networks: character erasing, character extraction, character magnify, and image synthesis. The architecture of the networks are extended based on the hourglass encoder-decoders. It inputs the original scene text image and outputs the text magnified image while keeps the background unchange. Intermediately, we can get the side-output results of text erasing and text extraction. The four sub-networks are first trained independently and fine-tuned in end-to-end mode. The training samples for each stage are processed through a flow with original image and text annotation in ICDAR2013 and Flickr dataset as input, and corresponding text erased image, magnified text annotation, and text magnified scene image as output. To evaluate the performance of text magnifier, the Structural Similarity is used to measure the regional changes in each character region. The experimental results demonstrate our method can magnify scene text effectively without effecting the background.
Tasks Image Generation
Published 2019-06-17
URL https://arxiv.org/abs/1907.00693v2
PDF https://arxiv.org/pdf/1907.00693v2.pdf
PWC https://paperswithcode.com/paper/scene-text-magnifier
Repo
Framework

A Fast Dictionary Learning Method for Coupled Feature Space Learning

Title A Fast Dictionary Learning Method for Coupled Feature Space Learning
Authors F. G. Veshki, S. A. Vorobyov
Abstract In this letter, we propose a novel computationally efficient coupled dictionary learning method that enforces pairwise correlation between the atoms of dictionaries learned to represent the underlying feature spaces of two different representations of the same signals, e.g., representations in different modalities or representations of the same signals measured with different qualities. The jointly learned correlated feature spaces represented by coupled dictionaries are used in sparse representation based classification, recognition and reconstruction tasks. The presented experimental results show that the proposed coupled dictionary learning method has a significantly lower computational cost. Moreover, the visual presentation of jointly learned dictionaries shows that the pairwise correlations between the corresponding atoms are ensured.
Tasks Dictionary Learning, Sparse Representation-based Classification
Published 2019-04-15
URL http://arxiv.org/abs/1904.06968v1
PDF http://arxiv.org/pdf/1904.06968v1.pdf
PWC https://paperswithcode.com/paper/a-fast-dictionary-learning-method-for-coupled
Repo
Framework

Predicting the Future: A Jointly Learnt Model for Action Anticipation

Title Predicting the Future: A Jointly Learnt Model for Action Anticipation
Authors Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract Inspired by human neurological structures for action anticipation, we present an action anticipation model that enables the prediction of plausible future actions by forecasting both the visual and temporal future. In contrast to current state-of-the-art methods which first learn a model to predict future video features and then perform action anticipation using these features, the proposed framework jointly learns to perform the two tasks, future visual and temporal representation synthesis, and early action anticipation. The joint learning framework ensures that the predicted future embeddings are informative to the action anticipation task. Furthermore, through extensive experimental evaluations we demonstrate the utility of using both visual and temporal semantics of the scene, and illustrate how this representation synthesis could be achieved through a recurrent Generative Adversarial Network (GAN) framework. Our model outperforms the current state-of-the-art methods on multiple datasets: UCF101, UCF101-24, UT-Interaction and TV Human Interaction.
Tasks
Published 2019-12-16
URL https://arxiv.org/abs/1912.07148v1
PDF https://arxiv.org/pdf/1912.07148v1.pdf
PWC https://paperswithcode.com/paper/predicting-the-future-a-jointly-learnt-model-1
Repo
Framework
comments powered by Disqus