October 21, 2019

3039 words 15 mins read

Paper Group AWR 129

An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks. When Vehicles See Pedestrians with Phones:A Multi-Cue Framework for Recognizing Phone-based Activities of Pedestrians. Consistent Robust Adversarial Prediction for General Multiclass Classification. Dynamic Vision Sensors for Human Activity Recognition. Inhibite …

An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks


Title	An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks
Authors	Kangfu Mei, Aiwen Jiang, Juncheng Li, Jihua Ye, Mingwen Wang
Abstract	Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers. In this paper, we focus on modeling the correlations between channels of convolutional features. We present an effective deep residual network based on squeeze-and-excitation blocks (SEBlock) to reconstruct high-resolution (HR) image from low-resolution (LR) image. SEBlock is used to adaptively recalibrate channel-wise feature mappings. Further, short connections between each SEBlock are used to remedy information loss. Extensive experiments show that our model can achieve the state-of-the-art performance and get finer texture details.
Tasks	Image Super-Resolution, Super-Resolution
Published	2018-10-03
URL	http://arxiv.org/abs/1810.01831v1
PDF	http://arxiv.org/pdf/1810.01831v1.pdf
PWC	https://paperswithcode.com/paper/an-effective-single-image-super-resolution
Repo	https://github.com/MKFMIKU/SrSENet
Framework	pytorch

When Vehicles See Pedestrians with Phones:A Multi-Cue Framework for Recognizing Phone-based Activities of Pedestrians


Title	When Vehicles See Pedestrians with Phones:A Multi-Cue Framework for Recognizing Phone-based Activities of Pedestrians
Authors	Akshay Rangesh, Mohan M. Trivedi
Abstract	The intelligent vehicle community has devoted considerable efforts to model driver behavior, and in particular to detect and overcome driver distraction in an effort to reduce accidents caused by driver negligence. However, as the domain increasingly shifts towards autonomous and semi-autonomous solutions, the driver is no longer integral to the decision making process, indicating a need to refocus efforts elsewhere. To this end, we propose to study pedestrian distraction instead. In particular, we focus on detecting pedestrians who are engaged in secondary activities involving their cellphones and similar handheld multimedia devices from a purely vision-based standpoint. To achieve this objective, we propose a pipeline incorporating articulated human pose estimation, followed by a soft object label transfer from an ensemble of exemplar SVMs trained on the nearest neighbors in pose feature space. We additionally incorporate head gaze features and prior pose information to carry out cellphone related pedestrian activity recognition. Finally, we offer a method to reliably track the articulated pose of a pedestrian through a sequence of images using a particle filter with a Gaussian Process Dynamical Model (GPDM), which can then be used to estimate sequentially varying activity scores at a very low computational cost. The entire framework is fast (especially for sequential data) and accurate, and easily extensible to include other secondary activities and sources of distraction.
Tasks	Activity Recognition, Decision Making, Pose Estimation
Published	2018-01-24
URL	http://arxiv.org/abs/1801.08234v1
PDF	http://arxiv.org/pdf/1801.08234v1.pdf
PWC	https://paperswithcode.com/paper/when-vehicles-see-pedestrians-with-phonesa
Repo	https://github.com/ginn24/ICE3050-41
Framework	tf

Consistent Robust Adversarial Prediction for General Multiclass Classification


Title	Consistent Robust Adversarial Prediction for General Multiclass Classification
Authors	Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Ali Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart
Abstract	We propose a robust adversarial prediction framework for general multiclass classification. Our method seeks predictive distributions that robustly optimize non-convex and non-continuous multiclass loss metrics against the worst-case conditional label distributions (the adversarial distributions) that (approximately) match the statistics of the training data. Although the optimized loss metrics are non-convex and non-continuous, the dual formulation of the framework is a convex optimization problem that can be recast as a risk minimization model with a prescribed convex surrogate loss we call the adversarial surrogate loss. We show that the adversarial surrogate losses fill an existing gap in surrogate loss construction for general multiclass classification problems, by simultaneously aligning better with the original multiclass loss, guaranteeing Fisher consistency, enabling a way to incorporate rich feature spaces via the kernel trick, and providing competitive performance in practice.
Tasks
Published	2018-12-18
URL	https://arxiv.org/abs/1812.07526v2
PDF	https://arxiv.org/pdf/1812.07526v2.pdf
PWC	https://paperswithcode.com/paper/consistent-robust-adversarial-prediction-for
Repo	https://github.com/rizalzaf/AdversarialPrediction.jl
Framework	pytorch

Dynamic Vision Sensors for Human Activity Recognition


Title	Dynamic Vision Sensors for Human Activity Recognition
Authors	Stefanie Anna Baby, Bimal Vinod, Chaitanya Chinni, Kaushik Mitra
Abstract	Unlike conventional cameras which capture video at a fixed frame rate, Dynamic Vision Sensors (DVS) record only changes in pixel intensity values. The output of DVS is simply a stream of discrete ON/OFF events based on the polarity of change in its pixel values. DVS has many attractive features such as low power consumption, high temporal resolution, high dynamic range and fewer storage requirements. All these make DVS a very promising camera for potential applications in wearable platforms where power consumption is a major concern. In this paper, we explore the feasibility of using DVS for Human Activity Recognition (HAR). We propose to use the various slices (such as $x-y$, $x-t$, and $y-t$) of the DVS video as a feature map for HAR and denote them as Motion Maps. We show that fusing motion maps with Motion Boundary Histogram (MBH) give good performance on the benchmark DVS dataset as well as on a real DVS gesture dataset collected by us. Interestingly, the performance of DVS is comparable to that of conventional videos although DVS captures only sparse motion information.
Tasks	Activity Recognition, Human Activity Recognition
Published	2018-03-13
URL	http://arxiv.org/abs/1803.04667v1
PDF	http://arxiv.org/pdf/1803.04667v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-vision-sensors-for-human-activity
Repo	https://github.com/Computational-Imaging-Lab-IITM/HAR-DVS
Framework	none

Inhibited Softmax for Uncertainty Estimation in Neural Networks


Title	Inhibited Softmax for Uncertainty Estimation in Neural Networks
Authors	Marcin Możejko, Mateusz Susik, Rafał Karczewski
Abstract	We present a new method for uncertainty estimation and out-of-distribution detection in neural networks with softmax output. We extend softmax layer with an additional constant input. The corresponding additional output is able to represent the uncertainty of the network. The proposed method requires neither additional parameters nor multiple forward passes nor input preprocessing nor out-of-distribution datasets. We show that our method performs comparably to more computationally expensive methods and outperforms baselines on our experiments from image recognition and sentiment analysis domains.
Tasks	Out-of-Distribution Detection, Sentiment Analysis
Published	2018-10-03
URL	http://arxiv.org/abs/1810.01861v2
PDF	http://arxiv.org/pdf/1810.01861v2.pdf
PWC	https://paperswithcode.com/paper/inhibited-softmax-for-uncertainty-estimation
Repo	https://github.com/MSusik/Inhibited-softmax
Framework	pytorch

Road Segmentation Using CNN and Distributed LSTM


Title	Road Segmentation Using CNN and Distributed LSTM
Authors	Yecheng Lyu, Lin Bai, Xinming Huang
Abstract	In automated driving systems (ADS) and advanced driver-assistance systems (ADAS), an efficient road segmentation is necessary to perceive the drivable region and build an occupancy map for path planning. The existing algorithms implement gigantic convolutional neural networks (CNNs) that are computationally expensive and time consuming. In this paper, we introduced distributed LSTM, a neural network widely used in audio and video processing, to process rows and columns in images and feature maps. We then propose a new network combining the convolutional and distributed LSTM layers to solve the road segmentation problem. In the end, the network is trained and tested in KITTI road benchmark. The result shows that the combined structure enhances the feature extraction and processing but takes less processing time than pure CNN structure.
Tasks
Published	2018-08-10
URL	http://arxiv.org/abs/1808.04450v2
PDF	http://arxiv.org/pdf/1808.04450v2.pdf
PWC	https://paperswithcode.com/paper/road-segmentation-using-cnn-and-distributed
Repo	https://github.com/Evvvvvvvva/AutonomousDriving
Framework	tf

Bayesian Neural Network Ensembles


Title	Bayesian Neural Network Ensembles
Authors	Tim Pearce, Mohamed Zaki, Andy Neely
Abstract	Ensembles of neural networks (NNs) have long been used to estimate predictive uncertainty; a small number of NNs are trained from different initialisations and sometimes on differing versions of the dataset. The variance of the ensemble’s predictions is interpreted as its epistemic uncertainty. The appeal of ensembling stems from being a collection of regular NNs - this makes them both scalable and easily implementable. They have achieved strong empirical results in recent years, often presented as a practical alternative to more costly Bayesian NNs (BNNs). The departure from Bayesian methodology is of concern since the Bayesian framework provides a principled, widely-accepted approach to handling uncertainty. In this extended abstract we derive and implement a modified NN ensembling scheme, which provides a consistent estimator of the Bayesian posterior in wide NNs - regularising parameters about values drawn from a prior distribution.
Tasks
Published	2018-11-27
URL	http://arxiv.org/abs/1811.12188v1
PDF	http://arxiv.org/pdf/1811.12188v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-neural-network-ensembles
Repo	https://github.com/petteriTeikari/pyML_regression_skeleton
Framework	none

DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects


Title	DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects
Authors	Lukas Tuggener, Ismail Elezi, Jürgen Schmidhuber, Marcello Pelillo, Thilo Stadelmann
Abstract	We present the DeepScores dataset with the goal of advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred millions of small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, beyond the scope of optical music recognition (OMR) research. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like Caltech101/256, PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, smaller computer vision datasets, as well as with other OMR datasets. Finally, we provide baseline performances for object classification and give pointers to future research based on this dataset.
Tasks	Object Classification, Object Recognition, Scene Understanding, Semantic Segmentation
Published	2018-03-27
URL	http://arxiv.org/abs/1804.00525v2
PDF	http://arxiv.org/pdf/1804.00525v2.pdf
PWC	https://paperswithcode.com/paper/deepscores-a-dataset-for-segmentation
Repo	https://github.com/ErenO/segmentation-dataset
Framework	none

RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications


Title	RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications
Authors	Huan Zhang, Pengchuan Zhang, Cho-Jui Hsieh
Abstract	The Jacobian matrix (or the gradient for single-output networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks. In this paper, we propose a recursive algorithm, RecurJac, to compute both upper and lower bounds for each element in the Jacobian matrix of a neural network with respect to network’s input, and the network can contain a wide range of activation functions. As a byproduct, we can efficiently obtain a (local) Lipschitz constant, which plays a crucial role in neural network robustness verification, as well as the training stability of GANs. Experiments show that (local) Lipschitz constants produced by our method is of better quality than previous approaches, thus providing better robustness verification results. Our algorithm has polynomial time complexity, and its computation time is reasonable even for relatively large networks. Additionally, we use our bounds of Jacobian matrix to characterize the landscape of the neural network, for example, to determine whether there exist stationary points in a local neighborhood. Source code available at \url{http://github.com/huanzhang12/RecurJac-Jacobian-bounds}.
Tasks
Published	2018-10-28
URL	http://arxiv.org/abs/1810.11783v2
PDF	http://arxiv.org/pdf/1810.11783v2.pdf
PWC	https://paperswithcode.com/paper/recurjac-an-efficient-recursive-algorithm-for
Repo	https://github.com/huanzhang12/RecurJac-and-CROWN
Framework	tf

Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision


Title	Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision
Authors	Jussi Hanhirova, Teemu Kämäräinen, Sipi Seppälä, Matti Siekkinen, Vesa Hirvisalo, Antti Ylä-Jääski
Abstract	We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency–throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.
Tasks	Object Recognition
Published	2018-03-26
URL	http://arxiv.org/abs/1803.09492v1
PDF	http://arxiv.org/pdf/1803.09492v1.pdf
PWC	https://paperswithcode.com/paper/latency-and-throughput-characterization-of
Repo	https://github.com/Dhananjayadmd/DNN_MP
Framework	none

Constraint-based Sequential Pattern Mining with Decision Diagrams


Title	Constraint-based Sequential Pattern Mining with Decision Diagrams
Authors	Amin Hosseininasab, Willem-Jan van Hoeve, Andre A. Cire
Abstract	Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency.
Tasks	Sequential Pattern Mining
Published	2018-11-14
URL	http://arxiv.org/abs/1811.06086v1
PDF	http://arxiv.org/pdf/1811.06086v1.pdf
PWC	https://paperswithcode.com/paper/constraint-based-sequential-pattern-mining
Repo	https://github.com/aminhn/MPP
Framework	none

CapsGAN: Using Dynamic Routing for Generative Adversarial Networks


Title	CapsGAN: Using Dynamic Routing for Generative Adversarial Networks
Authors	Raeid Saqur, Sal Vivona
Abstract	In this paper, we propose a novel technique for generating images in the 3D domain from images with high degree of geometrical transformations. By coalescing two popular concurrent methods that have seen rapid ascension to the machine learning zeitgeist in recent years: GANs (Goodfellow et. al.) and Capsule networks (Sabour, Hinton et. al.) - we present: \textbf{CapsGAN}. We show that CapsGAN performs better than or equal to traditional CNN based GANs in generating images with high geometric transformations using rotated MNIST. In the process, we also show the efficacy of using capsules architecture in the GANs domain. Furthermore, we tackle the Gordian Knot in training GANs - the performance control and training stability by experimenting with using Wasserstein distance (gradient clipping, penalty) and Spectral Normalization. The experimental findings of this paper should propel the application of capsules and GANs in the still exciting and nascent domain of 3D image generation, and plausibly video (frame) generation.
Tasks	Image Generation
Published	2018-06-07
URL	http://arxiv.org/abs/1806.03968v1
PDF	http://arxiv.org/pdf/1806.03968v1.pdf
PWC	https://paperswithcode.com/paper/capsgan-using-dynamic-routing-for-generative
Repo	https://github.com/raeidsaqur/CapsGAN
Framework	pytorch

InstaGAN: Instance-aware Image-to-Image Translation


Title	InstaGAN: Instance-aware Image-to-Image Translation
Authors	Sangwoo Mo, Minsu Cho, Jinwoo Shin
Abstract	Unsupervised image-to-image translation has gained considerable attention due to the recent impressive progress based on generative adversarial networks (GANs). However, previous methods often fail in challenging cases, in particular, when an image has multiple target instances and a translation task involves significant changes in shape, e.g., translating pants to skirts in fashion images. To tackle the issues, we propose a novel method, coined instance-aware GAN (InstaGAN), that incorporates the instance information (e.g., object segmentation masks) and improves multi-instance transfiguration. The proposed method translates both an image and the corresponding set of instance attributes while maintaining the permutation invariance property of the instances. To this end, we introduce a context preserving loss that encourages the network to learn the identity function outside of target instances. We also propose a sequential mini-batch inference/training technique that handles multiple instances with a limited GPU memory and enhances the network to generalize better for multiple instances. Our comparative evaluation demonstrates the effectiveness of the proposed method on different image datasets, in particular, in the aforementioned challenging cases. Code and results are available in https://github.com/sangwoomo/instagan
Tasks	Image-to-Image Translation, Semantic Segmentation, Unsupervised Image-To-Image Translation
Published	2018-12-28
URL	http://arxiv.org/abs/1812.10889v2
PDF	http://arxiv.org/pdf/1812.10889v2.pdf
PWC	https://paperswithcode.com/paper/instagan-instance-aware-image-to-image
Repo	https://github.com/sangwoomo/instagan
Framework	pytorch

Video Colorization using CNNs and Keyframes extraction: An application in saving bandwidth


Title	Video Colorization using CNNs and Keyframes extraction: An application in saving bandwidth
Authors	Ankur Singh, Anurag Chanani, Harish Karnick
Abstract	In this paper, we tackle the problem of colorization of grayscale videos to reduce bandwidth usage. For this task, we use some colored keyframes as reference images from the colored version of the grayscale video. We propose a model that extracts keyframes from a colored video and trains a Convolutional Neural Network from scratch on these colored frames. Through the extracted keyframes we get a good knowledge of the colors that have been used in the video which helps us in colorizing the grayscale version of the video efficiently. An application of the technique that we propose in this paper, is in saving bandwidth while sending raw colored videos that haven’t gone through any compression. A raw colored video takes up around three times more memory size than its grayscale version. We can exploit this fact and send a grayscale video along with out trained model instead of a colored video. Later on, in this paper we show how this technique can help to save bandwidth usage to upto three times while transmitting raw colored videos.
Tasks	Colorization
Published	2018-12-07
URL	http://arxiv.org/abs/1812.03858v3
PDF	http://arxiv.org/pdf/1812.03858v3.pdf
PWC	https://paperswithcode.com/paper/video-colorization-using-cnns-and-keyframes
Repo	https://github.com/achanani98/resume_shit
Framework	none

A Tree Search Algorithm for Sequence Labeling


Title	A Tree Search Algorithm for Sequence Labeling
Authors	Yadi Lao, Jun Xu, Yanyan Lan, Jiafeng Guo, Sheng Gao, Xueqi Cheng
Abstract	In this paper we propose a novel reinforcement learning based model for sequence tagging, referred to as MM-Tag. Inspired by the success and methodology of the AlphaGo Zero, MM-Tag formalizes the problem of sequence tagging with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP) model, in which the time steps correspond to the positions of words in a sentence from left to right, and each action corresponds to assign a tag to a word. Two long short-term memory networks (LSTM) are used to summarize the past tag assignments and words in the sentence. Based on the outputs of LSTMs, the policy for guiding the tag assignment and the value for predicting the whole tagging accuracy of the whole sentence are produced. The policy and value are then strengthened with MCTS, which takes the produced raw policy and value as inputs, simulates and evaluates the possible tag assignments at the subsequent positions, and outputs a better search policy for assigning tags. A reinforcement learning algorithm is proposed to train the model parameters. Our work is the first to apply the MCTS enhanced MDP model to the sequence tagging task. We show that MM-Tag can accurately predict the tags thanks to the exploratory decision making mechanism introduced by MCTS. Experimental results show based on a chunking benchmark showed that MM-Tag outperformed the state-of-the-art sequence tagging baselines including CRF and CRF with LSTM.
Tasks	Chunking, Decision Making
Published	2018-04-29
URL	http://arxiv.org/abs/1804.10911v2
PDF	http://arxiv.org/pdf/1804.10911v2.pdf
PWC	https://paperswithcode.com/paper/a-tree-search-algorithm-for-sequence-labeling
Repo	https://github.com/YadiLao/MM-Tag
Framework	tf