Paper Group AWR 129
An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks. When Vehicles See Pedestrians with Phones:A Multi-Cue Framework for Recognizing Phone-based Activities of Pedestrians. Consistent Robust Adversarial Prediction for General Multiclass Classification. Dynamic Vision Sensors for Human Activity Recognition. Inhibite …
An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks
Title | An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks |
Authors | Kangfu Mei, Aiwen Jiang, Juncheng Li, Jihua Ye, Mingwen Wang |
Abstract | Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers. In this paper, we focus on modeling the correlations between channels of convolutional features. We present an effective deep residual network based on squeeze-and-excitation blocks (SEBlock) to reconstruct high-resolution (HR) image from low-resolution (LR) image. SEBlock is used to adaptively recalibrate channel-wise feature mappings. Further, short connections between each SEBlock are used to remedy information loss. Extensive experiments show that our model can achieve the state-of-the-art performance and get finer texture details. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.01831v1 |
http://arxiv.org/pdf/1810.01831v1.pdf | |
PWC | https://paperswithcode.com/paper/an-effective-single-image-super-resolution |
Repo | https://github.com/MKFMIKU/SrSENet |
Framework | pytorch |
When Vehicles See Pedestrians with Phones:A Multi-Cue Framework for Recognizing Phone-based Activities of Pedestrians
Title | When Vehicles See Pedestrians with Phones:A Multi-Cue Framework for Recognizing Phone-based Activities of Pedestrians |
Authors | Akshay Rangesh, Mohan M. Trivedi |
Abstract | The intelligent vehicle community has devoted considerable efforts to model driver behavior, and in particular to detect and overcome driver distraction in an effort to reduce accidents caused by driver negligence. However, as the domain increasingly shifts towards autonomous and semi-autonomous solutions, the driver is no longer integral to the decision making process, indicating a need to refocus efforts elsewhere. To this end, we propose to study pedestrian distraction instead. In particular, we focus on detecting pedestrians who are engaged in secondary activities involving their cellphones and similar handheld multimedia devices from a purely vision-based standpoint. To achieve this objective, we propose a pipeline incorporating articulated human pose estimation, followed by a soft object label transfer from an ensemble of exemplar SVMs trained on the nearest neighbors in pose feature space. We additionally incorporate head gaze features and prior pose information to carry out cellphone related pedestrian activity recognition. Finally, we offer a method to reliably track the articulated pose of a pedestrian through a sequence of images using a particle filter with a Gaussian Process Dynamical Model (GPDM), which can then be used to estimate sequentially varying activity scores at a very low computational cost. The entire framework is fast (especially for sequential data) and accurate, and easily extensible to include other secondary activities and sources of distraction. |
Tasks | Activity Recognition, Decision Making, Pose Estimation |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08234v1 |
http://arxiv.org/pdf/1801.08234v1.pdf | |
PWC | https://paperswithcode.com/paper/when-vehicles-see-pedestrians-with-phonesa |
Repo | https://github.com/ginn24/ICE3050-41 |
Framework | tf |
Consistent Robust Adversarial Prediction for General Multiclass Classification
Title | Consistent Robust Adversarial Prediction for General Multiclass Classification |
Authors | Rizal Fathony, Kaiser Asif, Anqi Liu, Mohammad Ali Bashiri, Wei Xing, Sima Behpour, Xinhua Zhang, Brian D. Ziebart |
Abstract | We propose a robust adversarial prediction framework for general multiclass classification. Our method seeks predictive distributions that robustly optimize non-convex and non-continuous multiclass loss metrics against the worst-case conditional label distributions (the adversarial distributions) that (approximately) match the statistics of the training data. Although the optimized loss metrics are non-convex and non-continuous, the dual formulation of the framework is a convex optimization problem that can be recast as a risk minimization model with a prescribed convex surrogate loss we call the adversarial surrogate loss. We show that the adversarial surrogate losses fill an existing gap in surrogate loss construction for general multiclass classification problems, by simultaneously aligning better with the original multiclass loss, guaranteeing Fisher consistency, enabling a way to incorporate rich feature spaces via the kernel trick, and providing competitive performance in practice. |
Tasks | |
Published | 2018-12-18 |
URL | https://arxiv.org/abs/1812.07526v2 |
https://arxiv.org/pdf/1812.07526v2.pdf | |
PWC | https://paperswithcode.com/paper/consistent-robust-adversarial-prediction-for |
Repo | https://github.com/rizalzaf/AdversarialPrediction.jl |
Framework | pytorch |
Dynamic Vision Sensors for Human Activity Recognition
Title | Dynamic Vision Sensors for Human Activity Recognition |
Authors | Stefanie Anna Baby, Bimal Vinod, Chaitanya Chinni, Kaushik Mitra |
Abstract | Unlike conventional cameras which capture video at a fixed frame rate, Dynamic Vision Sensors (DVS) record only changes in pixel intensity values. The output of DVS is simply a stream of discrete ON/OFF events based on the polarity of change in its pixel values. DVS has many attractive features such as low power consumption, high temporal resolution, high dynamic range and fewer storage requirements. All these make DVS a very promising camera for potential applications in wearable platforms where power consumption is a major concern. In this paper, we explore the feasibility of using DVS for Human Activity Recognition (HAR). We propose to use the various slices (such as $x-y$, $x-t$, and $y-t$) of the DVS video as a feature map for HAR and denote them as Motion Maps. We show that fusing motion maps with Motion Boundary Histogram (MBH) give good performance on the benchmark DVS dataset as well as on a real DVS gesture dataset collected by us. Interestingly, the performance of DVS is comparable to that of conventional videos although DVS captures only sparse motion information. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04667v1 |
http://arxiv.org/pdf/1803.04667v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-vision-sensors-for-human-activity |
Repo | https://github.com/Computational-Imaging-Lab-IITM/HAR-DVS |
Framework | none |
Inhibited Softmax for Uncertainty Estimation in Neural Networks
Title | Inhibited Softmax for Uncertainty Estimation in Neural Networks |
Authors | Marcin Możejko, Mateusz Susik, Rafał Karczewski |
Abstract | We present a new method for uncertainty estimation and out-of-distribution detection in neural networks with softmax output. We extend softmax layer with an additional constant input. The corresponding additional output is able to represent the uncertainty of the network. The proposed method requires neither additional parameters nor multiple forward passes nor input preprocessing nor out-of-distribution datasets. We show that our method performs comparably to more computationally expensive methods and outperforms baselines on our experiments from image recognition and sentiment analysis domains. |
Tasks | Out-of-Distribution Detection, Sentiment Analysis |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.01861v2 |
http://arxiv.org/pdf/1810.01861v2.pdf | |
PWC | https://paperswithcode.com/paper/inhibited-softmax-for-uncertainty-estimation |
Repo | https://github.com/MSusik/Inhibited-softmax |
Framework | pytorch |
Road Segmentation Using CNN and Distributed LSTM
Title | Road Segmentation Using CNN and Distributed LSTM |
Authors | Yecheng Lyu, Lin Bai, Xinming Huang |
Abstract | In automated driving systems (ADS) and advanced driver-assistance systems (ADAS), an efficient road segmentation is necessary to perceive the drivable region and build an occupancy map for path planning. The existing algorithms implement gigantic convolutional neural networks (CNNs) that are computationally expensive and time consuming. In this paper, we introduced distributed LSTM, a neural network widely used in audio and video processing, to process rows and columns in images and feature maps. We then propose a new network combining the convolutional and distributed LSTM layers to solve the road segmentation problem. In the end, the network is trained and tested in KITTI road benchmark. The result shows that the combined structure enhances the feature extraction and processing but takes less processing time than pure CNN structure. |
Tasks | |
Published | 2018-08-10 |
URL | http://arxiv.org/abs/1808.04450v2 |
http://arxiv.org/pdf/1808.04450v2.pdf | |
PWC | https://paperswithcode.com/paper/road-segmentation-using-cnn-and-distributed |
Repo | https://github.com/Evvvvvvvva/AutonomousDriving |
Framework | tf |
Bayesian Neural Network Ensembles
Title | Bayesian Neural Network Ensembles |
Authors | Tim Pearce, Mohamed Zaki, Andy Neely |
Abstract | Ensembles of neural networks (NNs) have long been used to estimate predictive uncertainty; a small number of NNs are trained from different initialisations and sometimes on differing versions of the dataset. The variance of the ensemble’s predictions is interpreted as its epistemic uncertainty. The appeal of ensembling stems from being a collection of regular NNs - this makes them both scalable and easily implementable. They have achieved strong empirical results in recent years, often presented as a practical alternative to more costly Bayesian NNs (BNNs). The departure from Bayesian methodology is of concern since the Bayesian framework provides a principled, widely-accepted approach to handling uncertainty. In this extended abstract we derive and implement a modified NN ensembling scheme, which provides a consistent estimator of the Bayesian posterior in wide NNs - regularising parameters about values drawn from a prior distribution. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.12188v1 |
http://arxiv.org/pdf/1811.12188v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-neural-network-ensembles |
Repo | https://github.com/petteriTeikari/pyML_regression_skeleton |
Framework | none |
DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects
Title | DeepScores – A Dataset for Segmentation, Detection and Classification of Tiny Objects |
Authors | Lukas Tuggener, Ismail Elezi, Jürgen Schmidhuber, Marcello Pelillo, Thilo Stadelmann |
Abstract | We present the DeepScores dataset with the goal of advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred millions of small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, beyond the scope of optical music recognition (OMR) research. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like Caltech101/256, PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, smaller computer vision datasets, as well as with other OMR datasets. Finally, we provide baseline performances for object classification and give pointers to future research based on this dataset. |
Tasks | Object Classification, Object Recognition, Scene Understanding, Semantic Segmentation |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1804.00525v2 |
http://arxiv.org/pdf/1804.00525v2.pdf | |
PWC | https://paperswithcode.com/paper/deepscores-a-dataset-for-segmentation |
Repo | https://github.com/ErenO/segmentation-dataset |
Framework | none |
RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications
Title | RecurJac: An Efficient Recursive Algorithm for Bounding Jacobian Matrix of Neural Networks and Its Applications |
Authors | Huan Zhang, Pengchuan Zhang, Cho-Jui Hsieh |
Abstract | The Jacobian matrix (or the gradient for single-output networks) is directly related to many important properties of neural networks, such as the function landscape, stationary points, (local) Lipschitz constants and robustness to adversarial attacks. In this paper, we propose a recursive algorithm, RecurJac, to compute both upper and lower bounds for each element in the Jacobian matrix of a neural network with respect to network’s input, and the network can contain a wide range of activation functions. As a byproduct, we can efficiently obtain a (local) Lipschitz constant, which plays a crucial role in neural network robustness verification, as well as the training stability of GANs. Experiments show that (local) Lipschitz constants produced by our method is of better quality than previous approaches, thus providing better robustness verification results. Our algorithm has polynomial time complexity, and its computation time is reasonable even for relatively large networks. Additionally, we use our bounds of Jacobian matrix to characterize the landscape of the neural network, for example, to determine whether there exist stationary points in a local neighborhood. Source code available at \url{http://github.com/huanzhang12/RecurJac-Jacobian-bounds}. |
Tasks | |
Published | 2018-10-28 |
URL | http://arxiv.org/abs/1810.11783v2 |
http://arxiv.org/pdf/1810.11783v2.pdf | |
PWC | https://paperswithcode.com/paper/recurjac-an-efficient-recursive-algorithm-for |
Repo | https://github.com/huanzhang12/RecurJac-and-CROWN |
Framework | tf |
Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision
Title | Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision |
Authors | Jussi Hanhirova, Teemu Kämäräinen, Sipi Seppälä, Matti Siekkinen, Vesa Hirvisalo, Antti Ylä-Jääski |
Abstract | We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency–throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior. |
Tasks | Object Recognition |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09492v1 |
http://arxiv.org/pdf/1803.09492v1.pdf | |
PWC | https://paperswithcode.com/paper/latency-and-throughput-characterization-of |
Repo | https://github.com/Dhananjayadmd/DNN_MP |
Framework | none |
Constraint-based Sequential Pattern Mining with Decision Diagrams
Title | Constraint-based Sequential Pattern Mining with Decision Diagrams |
Authors | Amin Hosseininasab, Willem-Jan van Hoeve, Andre A. Cire |
Abstract | Constrained sequential pattern mining aims at identifying frequent patterns on a sequential database of items while observing constraints defined over the item attributes. We introduce novel techniques for constraint-based sequential pattern mining that rely on a multi-valued decision diagram representation of the database. Specifically, our representation can accommodate multiple item attributes and various constraint types, including a number of non-monotone constraints. To evaluate the applicability of our approach, we develop an MDD-based prefix-projection algorithm and compare its performance against a typical generate-and-check variant, as well as a state-of-the-art constraint-based sequential pattern mining algorithm. Results show that our approach is competitive with or superior to these other methods in terms of scalability and efficiency. |
Tasks | Sequential Pattern Mining |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.06086v1 |
http://arxiv.org/pdf/1811.06086v1.pdf | |
PWC | https://paperswithcode.com/paper/constraint-based-sequential-pattern-mining |
Repo | https://github.com/aminhn/MPP |
Framework | none |
CapsGAN: Using Dynamic Routing for Generative Adversarial Networks
Title | CapsGAN: Using Dynamic Routing for Generative Adversarial Networks |
Authors | Raeid Saqur, Sal Vivona |
Abstract | In this paper, we propose a novel technique for generating images in the 3D domain from images with high degree of geometrical transformations. By coalescing two popular concurrent methods that have seen rapid ascension to the machine learning zeitgeist in recent years: GANs (Goodfellow et. al.) and Capsule networks (Sabour, Hinton et. al.) - we present: \textbf{CapsGAN}. We show that CapsGAN performs better than or equal to traditional CNN based GANs in generating images with high geometric transformations using rotated MNIST. In the process, we also show the efficacy of using capsules architecture in the GANs domain. Furthermore, we tackle the Gordian Knot in training GANs - the performance control and training stability by experimenting with using Wasserstein distance (gradient clipping, penalty) and Spectral Normalization. The experimental findings of this paper should propel the application of capsules and GANs in the still exciting and nascent domain of 3D image generation, and plausibly video (frame) generation. |
Tasks | Image Generation |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.03968v1 |
http://arxiv.org/pdf/1806.03968v1.pdf | |
PWC | https://paperswithcode.com/paper/capsgan-using-dynamic-routing-for-generative |
Repo | https://github.com/raeidsaqur/CapsGAN |
Framework | pytorch |
InstaGAN: Instance-aware Image-to-Image Translation
Title | InstaGAN: Instance-aware Image-to-Image Translation |
Authors | Sangwoo Mo, Minsu Cho, Jinwoo Shin |
Abstract | Unsupervised image-to-image translation has gained considerable attention due to the recent impressive progress based on generative adversarial networks (GANs). However, previous methods often fail in challenging cases, in particular, when an image has multiple target instances and a translation task involves significant changes in shape, e.g., translating pants to skirts in fashion images. To tackle the issues, we propose a novel method, coined instance-aware GAN (InstaGAN), that incorporates the instance information (e.g., object segmentation masks) and improves multi-instance transfiguration. The proposed method translates both an image and the corresponding set of instance attributes while maintaining the permutation invariance property of the instances. To this end, we introduce a context preserving loss that encourages the network to learn the identity function outside of target instances. We also propose a sequential mini-batch inference/training technique that handles multiple instances with a limited GPU memory and enhances the network to generalize better for multiple instances. Our comparative evaluation demonstrates the effectiveness of the proposed method on different image datasets, in particular, in the aforementioned challenging cases. Code and results are available in https://github.com/sangwoomo/instagan |
Tasks | Image-to-Image Translation, Semantic Segmentation, Unsupervised Image-To-Image Translation |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.10889v2 |
http://arxiv.org/pdf/1812.10889v2.pdf | |
PWC | https://paperswithcode.com/paper/instagan-instance-aware-image-to-image |
Repo | https://github.com/sangwoomo/instagan |
Framework | pytorch |
Video Colorization using CNNs and Keyframes extraction: An application in saving bandwidth
Title | Video Colorization using CNNs and Keyframes extraction: An application in saving bandwidth |
Authors | Ankur Singh, Anurag Chanani, Harish Karnick |
Abstract | In this paper, we tackle the problem of colorization of grayscale videos to reduce bandwidth usage. For this task, we use some colored keyframes as reference images from the colored version of the grayscale video. We propose a model that extracts keyframes from a colored video and trains a Convolutional Neural Network from scratch on these colored frames. Through the extracted keyframes we get a good knowledge of the colors that have been used in the video which helps us in colorizing the grayscale version of the video efficiently. An application of the technique that we propose in this paper, is in saving bandwidth while sending raw colored videos that haven’t gone through any compression. A raw colored video takes up around three times more memory size than its grayscale version. We can exploit this fact and send a grayscale video along with out trained model instead of a colored video. Later on, in this paper we show how this technique can help to save bandwidth usage to upto three times while transmitting raw colored videos. |
Tasks | Colorization |
Published | 2018-12-07 |
URL | http://arxiv.org/abs/1812.03858v3 |
http://arxiv.org/pdf/1812.03858v3.pdf | |
PWC | https://paperswithcode.com/paper/video-colorization-using-cnns-and-keyframes |
Repo | https://github.com/achanani98/resume_shit |
Framework | none |
A Tree Search Algorithm for Sequence Labeling
Title | A Tree Search Algorithm for Sequence Labeling |
Authors | Yadi Lao, Jun Xu, Yanyan Lan, Jiafeng Guo, Sheng Gao, Xueqi Cheng |
Abstract | In this paper we propose a novel reinforcement learning based model for sequence tagging, referred to as MM-Tag. Inspired by the success and methodology of the AlphaGo Zero, MM-Tag formalizes the problem of sequence tagging with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP) model, in which the time steps correspond to the positions of words in a sentence from left to right, and each action corresponds to assign a tag to a word. Two long short-term memory networks (LSTM) are used to summarize the past tag assignments and words in the sentence. Based on the outputs of LSTMs, the policy for guiding the tag assignment and the value for predicting the whole tagging accuracy of the whole sentence are produced. The policy and value are then strengthened with MCTS, which takes the produced raw policy and value as inputs, simulates and evaluates the possible tag assignments at the subsequent positions, and outputs a better search policy for assigning tags. A reinforcement learning algorithm is proposed to train the model parameters. Our work is the first to apply the MCTS enhanced MDP model to the sequence tagging task. We show that MM-Tag can accurately predict the tags thanks to the exploratory decision making mechanism introduced by MCTS. Experimental results show based on a chunking benchmark showed that MM-Tag outperformed the state-of-the-art sequence tagging baselines including CRF and CRF with LSTM. |
Tasks | Chunking, Decision Making |
Published | 2018-04-29 |
URL | http://arxiv.org/abs/1804.10911v2 |
http://arxiv.org/pdf/1804.10911v2.pdf | |
PWC | https://paperswithcode.com/paper/a-tree-search-algorithm-for-sequence-labeling |
Repo | https://github.com/YadiLao/MM-Tag |
Framework | tf |