January 26, 2020

2856 words 14 mins read

Paper Group ANR 1390

Constructing a provably adversarially-robust classifier from a high accuracy one. Source Coding Based mmWave Channel Estimation with Deep Learning Based Decoding. Fighting Quantization Bias With Bias. Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video. Neural Response Generation with Meta-Words. HexagD …

Constructing a provably adversarially-robust classifier from a high accuracy one


Title	Constructing a provably adversarially-robust classifier from a high accuracy one
Authors	Grzegorz Głuch, Rüdiger Urbanke
Abstract	Modern machine learning models with very high accuracy have been shown to be vulnerable to small, adversarially chosen perturbations of the input. Given black-box access to a high-accuracy classifier $f$, we show how to construct a new classifier $g$ that has high accuracy and is also robust to adversarial $\ell_2$-bounded perturbations. Our algorithm builds upon the framework of \textit{randomized smoothing} that has been recently shown to outperform all previous defenses against $\ell_2$-bounded adversaries. Using techniques like random partitions and doubling dimension, we are able to bound the adversarial error of $g$ in terms of the optimum error. In this paper we focus on our conceptual contribution, but we do present two examples to illustrate our framework. We will argue that, under some assumptions, our bounds are optimal for these cases.
Tasks
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07561v1
PDF	https://arxiv.org/pdf/1912.07561v1.pdf
PWC	https://paperswithcode.com/paper/constructing-a-provably-adversarially-robust
Repo
Framework

Source Coding Based mmWave Channel Estimation with Deep Learning Based Decoding


Title	Source Coding Based mmWave Channel Estimation with Deep Learning Based Decoding
Authors	Yahia Shabara, Eylem Ekici, C. Emre Koksal
Abstract	mmWave technology is set to become a main feature of next generation wireless networks, e.g., 5G mobile and WiFi 802.11ad/ay. Among the basic and most fundamental challenges facing mmWave is the ability to overcome its unfavorable propagation characteristics using energy efficient solutions. This has been addressed using innovative transceiver architectures. However, these architectures have their own limitations when it comes to channel estimation. This paper focuses on channel estimation and poses it as a source compression problem, where channel measurements are designed to mimic an encoded (compressed) version of the channel. We show that linear source codes can significantly reduce the number of channel measurements required to discover all channel paths. We also propose a deep-learning-based approach for decoding the obtained measurements, which enables high-speed and efficient channel discovery.
Tasks
Published	2019-04-30
URL	http://arxiv.org/abs/1905.00124v1
PDF	http://arxiv.org/pdf/1905.00124v1.pdf
PWC	https://paperswithcode.com/paper/source-coding-based-mmwave-channel-estimation
Repo
Framework

Fighting Quantization Bias With Bias


Title	Fighting Quantization Bias With Bias
Authors	Alexander Finkelstein, Uri Almog, Mark Grobman
Abstract	Low-precision representation of deep neural networks (DNNs) is critical for efficient deployment of deep learning application on embedded platforms, however, converting the network to low precision degrades its performance. Crucially, networks that are designed for embedded applications usually suffer from increased degradation since they have less redundancy. This is most evident for the ubiquitous MobileNet architecture which requires a costly quantization-aware training cycle to achieve acceptable performance when quantized to 8-bits. In this paper, we trace the source of the degradation in MobileNets to a shift in the mean activation value. This shift is caused by an inherent bias in the quantization process which builds up across layers, shifting all network statistics away from the learned distribution. We show that this phenomenon happens in other architectures as well. We propose a simple remedy - compensating for the quantization induced shift by adding a constant to the additive bias term of each channel. We develop two simple methods for estimating the correction constants - one using iterative evaluation of the quantized network and one where the constants are set using a short training phase. Both methods are fast and require only a small amount of unlabeled data, making them appealing for rapid deployment of neural networks. Using the above methods we are able to match the performance of training-based quantization of MobileNets at a fraction of the cost.
Tasks	Quantization
Published	2019-06-07
URL	https://arxiv.org/abs/1906.03193v1
PDF	https://arxiv.org/pdf/1906.03193v1.pdf
PWC	https://paperswithcode.com/paper/fighting-quantization-bias-with-bias
Repo
Framework

Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video


Title	Target-Specific Action Classification for Automated Assessment of Human Motor Behavior from Video
Authors	Behnaz Rezaei, Yiorgos Christakis, Bryan Ho, Kevin Thomas, Kelley Erb, Sarah Ostadabbas, Shyamal Patel
Abstract	Objective monitoring and assessment of human motor behavior can improve the diagnosis and management of several medical conditions. Over the past decade, significant advances have been made in the use of wearable technology for continuously monitoring human motor behavior in free-living conditions. However, wearable technology remains ill-suited for applications which require monitoring and interpretation of complex motor behaviors (e.g. involving interactions with the environment). Recent advances in computer vision and deep learning have opened up new possibilities for extracting information from video recordings. In this paper, we present a hierarchical vision-based behavior phenotyping method for classification of basic human actions in video recordings performed using a single RGB camera. Our method addresses challenges associated with tracking multiple human actors and classification of actions in videos recorded in changing environments with different fields of view. We implement a cascaded pose tracker that uses temporal relationships between detections for short-term tracking and appearance-based tracklet fusion for long-term tracking. Furthermore, for action classification, we use pose evolution maps derived from the cascaded pose tracker as low-dimensional and interpretable representations of the movement sequences for training a convolutional neural network. The cascaded pose tracker achieves an average accuracy of 88% in tracking the target human actor in our video recordings, and overall system achieves average test accuracy of 84% for target-specific action classification in untrimmed video recordings.
Tasks	Action Classification, Action Recognition In Videos
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09566v1
PDF	https://arxiv.org/pdf/1909.09566v1.pdf
PWC	https://paperswithcode.com/paper/target-specific-action-classification-for
Repo
Framework

Neural Response Generation with Meta-Words


Title	Neural Response Generation with Meta-Words
Authors	Can Xu, Wei Wu, Chongyang Tao, Huang Hu, Matt Schuerman, Ying Wang
Abstract	We present open domain response generation with meta-words. A meta-word is a structured record that describes various attributes of a response, and thus allows us to explicitly model the one-to-many relationship within open domain dialogues and perform response generation in an explainable and controllable manner. To incorporate meta-words into generation, we enhance the sequence-to-sequence architecture with a goal tracking memory network that formalizes meta-word expression as a goal and manages the generation process to achieve the goal with a state memory panel and a state controller. Experimental results on two large-scale datasets indicate that our model can significantly outperform several state-of-the-art generation models in terms of response relevance, response diversity, accuracy of one-to-many modeling, accuracy of meta-word expression, and human evaluation.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06050v1
PDF	https://arxiv.org/pdf/1906.06050v1.pdf
PWC	https://paperswithcode.com/paper/neural-response-generation-with-meta-words
Repo
Framework

HexagDLy - Processing hexagonally sampled data with CNNs in PyTorch


Title	HexagDLy - Processing hexagonally sampled data with CNNs in PyTorch
Authors	Constantin Steppa, Tim Lukas Holch
Abstract	HexagDLy is a Python-library extending the PyTorch deep learning framework with convolution and pooling operations on hexagonal grids. It aims to ease the access to convolutional neural networks for applications that rely on hexagonally sampled data as, for example, commonly found in ground-based astroparticle physics experiments.
Tasks
Published	2019-03-05
URL	http://arxiv.org/abs/1903.01814v1
PDF	http://arxiv.org/pdf/1903.01814v1.pdf
PWC	https://paperswithcode.com/paper/hexagdly-processing-hexagonally-sampled-data
Repo
Framework

Fine-grained Action Segmentation using the Semi-Supervised Action GAN


Title	Fine-grained Action Segmentation using the Semi-Supervised Action GAN
Authors	Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract	In this paper we address the problem of continuous fine-grained action segmentation, in which multiple actions are present in an unsegmented video stream. The challenge for this task lies in the need to represent the hierarchical nature of the actions and to detect the transitions between actions, allowing us to localise the actions within the video effectively. We propose a novel recurrent semi-supervised Generative Adversarial Network (GAN) model for continuous fine-grained human action segmentation. Temporal context information is captured via a novel Gated Context Extractor (GCE) module, composed of gated attention units, that directs the queued context information through the generator model, for enhanced action segmentation. The GAN is made to learn features in a semi-supervised manner, enabling the model to perform action classification jointly with the standard, unsupervised, GAN learning procedure. We perform extensive evaluations on different architectural variants to demonstrate the importance of the proposed network architecture, and show that it is capable of outperforming current state-of-the-art on three challenging datasets: 50 Salads, MERL Shopping and Georgia Tech Egocentric Activities dataset.
Tasks	Action Classification, action segmentation
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09269v1
PDF	https://arxiv.org/pdf/1909.09269v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-action-segmentation-using-the
Repo
Framework

Deep Concept-wise Temporal Convolutional Networks for Action Localization


Title	Deep Concept-wise Temporal Convolutional Networks for Action Localization
Authors	Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, Wangmeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen
Abstract	Existing action localization approaches adopt shallow temporal convolutional networks (\ie, TCN) on 1D feature map extracted from video frames. In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution. To address this issue, we introduce a novel concept-wise temporal convolution (CTC) layer as an alternative to conventional temporal convolution layer for training deeper action localization networks. Instead of recombining latent concepts, CTC layer deploys a number of temporal filters to each concept separately with shared filter parameters across concepts. Thus can capture common temporal patterns of different concepts and significantly enrich representation ability. Via stacking CTC layers, we proposed a deep concept-wise temporal convolutional network (C-TCN), which boosts the state-of-the-art action localization performance on THUMOS’14 from 42.8 to 52.1 in terms of mAP(%), achieving a relative improvement of 21.7%. Favorable result is also obtained on ActivityNet.
Tasks	Action Classification, Action Localization
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09442v1
PDF	https://arxiv.org/pdf/1908.09442v1.pdf
PWC	https://paperswithcode.com/paper/deep-concept-wise-temporal-convolutional
Repo
Framework

Delving into 3D Action Anticipation from Streaming Videos


Title	Delving into 3D Action Anticipation from Streaming Videos
Authors	Hongsong Wang, Jiashi Feng
Abstract	Action anticipation, which aims to recognize the action with a partial observation, becomes increasingly popular due to a wide range of applications. In this paper, we investigate the problem of 3D action anticipation from streaming videos with the target of understanding best practices for solving this problem. We first introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification. To achieve better performance, we then investigate two important factors, i.e., the length of the training clip and clip sampling method. We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label. Our comprehensive experiments uncover the best practices for 3D action anticipation, and accordingly we propose a novel method with a multi-task loss. The proposed method considerably outperforms the recent methods and exhibits the state-of-the-art performance on standard benchmarks.
Tasks	Action Classification, Multi-Task Learning
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06521v1
PDF	https://arxiv.org/pdf/1906.06521v1.pdf
PWC	https://paperswithcode.com/paper/delving-into-3d-action-anticipation-from
Repo
Framework

Deep Learning via Dynamical Systems: An Approximation Perspective


Title	Deep Learning via Dynamical Systems: An Approximation Perspective
Authors	Qianxiao Li, Ting Lin, Zuowei Shen
Abstract	We build on the dynamical systems approach to deep learning, where deep residual networks are idealized as continuous-time dynamical systems. Although theoretical foundations have been developed on the optimization side through mean-field optimal control theory, the function approximation properties of such models remain largely unexplored, especially when the dynamical systems are controlled by functions of low complexity. In this paper, we establish some basic results on the approximation capabilities of deep learning models in the form of dynamical systems. In particular, we derive general sufficient conditions for universal approximation of functions in $L^p$ using flow maps of dynamical systems, and we also deduce some results on their approximation rates for specific cases. Overall, these results reveal that composition function approximation through flow maps present a new paradigm in approximation theory and contributes to building a useful mathematical framework to investigate deep learning.
Tasks
Published	2019-12-22
URL	https://arxiv.org/abs/1912.10382v1
PDF	https://arxiv.org/pdf/1912.10382v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-via-dynamical-systems-an
Repo
Framework

How Machine (Deep) Learning Helps Us Understand Human Learning: the Value of Big Ideas


Title	How Machine (Deep) Learning Helps Us Understand Human Learning: the Value of Big Ideas
Authors	Marc Maliar
Abstract	I use simulation of two multilayer neural networks to gain intuition into the determinants of human learning. The first network, the teacher, is trained to achieve a high accuracy in handwritten digit recognition. The second network, the student, learns to reproduce the output of the first network. I show that learning from the teacher is more effective than learning from the data under the appropriate degree of regularization. Regularization allows the teacher to distinguish the trends and to deliver “big ideas” to the student. I also model other learning situations such as expert and novice teachers, high- and low-ability students and biased learning experience due to, e.g., poverty and trauma. The results from computer simulation accord remarkably well with finding of the modern psychological literature. The code is written in MATLAB and will be publicly available from the author’s web page.
Tasks	Handwritten Digit Recognition
Published	2019-02-16
URL	http://arxiv.org/abs/1903.03408v2
PDF	http://arxiv.org/pdf/1903.03408v2.pdf
PWC	https://paperswithcode.com/paper/how-machine-deep-learning-helps-us-understand
Repo
Framework

Temporal Factorization of 3D Convolutional Kernels


Title	Temporal Factorization of 3D Convolutional Kernels
Authors	Gabriëlle Ras, Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven
Abstract	3D convolutional neural networks are difficult to train because they are parameter-expensive and data-hungry. To solve these problems we propose a simple technique for learning 3D convolutional kernels efficiently requiring less training data. We achieve this by factorizing the 3D kernel along the temporal dimension, reducing the number of parameters and making training from data more efficient. Additionally we introduce a novel dataset called Video-MNIST to demonstrate the performance of our method. Our method significantly outperforms the conventional 3D convolution in the low data regime (1 to 5 videos per class). Finally, our model achieves competitive results in the high data regime (>10 videos per class) using up to 45% fewer parameters.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04075v1
PDF	https://arxiv.org/pdf/1912.04075v1.pdf
PWC	https://paperswithcode.com/paper/temporal-factorization-of-3d-convolutional
Repo
Framework

Scene Text Magnifier


Title	Scene Text Magnifier
Authors	Toshiki Nakamura, Anna Zhu, Seiichi Uchida
Abstract	Scene text magnifier aims to magnify text in natural scene images without recognition. It could help the special groups, who have myopia or dyslexia to better understand the scene. In this paper, we design the scene text magnifier through interacted four CNN-based networks: character erasing, character extraction, character magnify, and image synthesis. The architecture of the networks are extended based on the hourglass encoder-decoders. It inputs the original scene text image and outputs the text magnified image while keeps the background unchange. Intermediately, we can get the side-output results of text erasing and text extraction. The four sub-networks are first trained independently and fine-tuned in end-to-end mode. The training samples for each stage are processed through a flow with original image and text annotation in ICDAR2013 and Flickr dataset as input, and corresponding text erased image, magnified text annotation, and text magnified scene image as output. To evaluate the performance of text magnifier, the Structural Similarity is used to measure the regional changes in each character region. The experimental results demonstrate our method can magnify scene text effectively without effecting the background.
Tasks	Image Generation
Published	2019-06-17
URL	https://arxiv.org/abs/1907.00693v2
PDF	https://arxiv.org/pdf/1907.00693v2.pdf
PWC	https://paperswithcode.com/paper/scene-text-magnifier
Repo
Framework

A Fast Dictionary Learning Method for Coupled Feature Space Learning


Title	A Fast Dictionary Learning Method for Coupled Feature Space Learning
Authors	F. G. Veshki, S. A. Vorobyov
Abstract	In this letter, we propose a novel computationally efficient coupled dictionary learning method that enforces pairwise correlation between the atoms of dictionaries learned to represent the underlying feature spaces of two different representations of the same signals, e.g., representations in different modalities or representations of the same signals measured with different qualities. The jointly learned correlated feature spaces represented by coupled dictionaries are used in sparse representation based classification, recognition and reconstruction tasks. The presented experimental results show that the proposed coupled dictionary learning method has a significantly lower computational cost. Moreover, the visual presentation of jointly learned dictionaries shows that the pairwise correlations between the corresponding atoms are ensured.
Tasks	Dictionary Learning, Sparse Representation-based Classification
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06968v1
PDF	http://arxiv.org/pdf/1904.06968v1.pdf
PWC	https://paperswithcode.com/paper/a-fast-dictionary-learning-method-for-coupled
Repo
Framework

Predicting the Future: A Jointly Learnt Model for Action Anticipation


Title	Predicting the Future: A Jointly Learnt Model for Action Anticipation
Authors	Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes
Abstract	Inspired by human neurological structures for action anticipation, we present an action anticipation model that enables the prediction of plausible future actions by forecasting both the visual and temporal future. In contrast to current state-of-the-art methods which first learn a model to predict future video features and then perform action anticipation using these features, the proposed framework jointly learns to perform the two tasks, future visual and temporal representation synthesis, and early action anticipation. The joint learning framework ensures that the predicted future embeddings are informative to the action anticipation task. Furthermore, through extensive experimental evaluations we demonstrate the utility of using both visual and temporal semantics of the scene, and illustrate how this representation synthesis could be achieved through a recurrent Generative Adversarial Network (GAN) framework. Our model outperforms the current state-of-the-art methods on multiple datasets: UCF101, UCF101-24, UT-Interaction and TV Human Interaction.
Tasks
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07148v1
PDF	https://arxiv.org/pdf/1912.07148v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-future-a-jointly-learnt-model-1
Repo
Framework