July 28, 2019

3049 words 15 mins read

Paper Group ANR 335

Paper Group ANR 335

Transfer Learning with Binary Neural Networks. Budget-Aware Activity Detection with A Recurrent Policy Network. Deformable Modeling for Human Body Acquired from Depth Sensors. Reading Scene Text with Attention Convolutional Sequence Modeling. Predicting Target Language CCG Supertags Improves Neural Machine Translation. A convergence framework for i …

Transfer Learning with Binary Neural Networks

Title Transfer Learning with Binary Neural Networks
Authors Sam Leroux, Steven Bohez, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, Bart Dhoedt
Abstract Previous work has shown that it is possible to train deep neural networks with low precision weights and activations. In the extreme case it is even possible to constrain the network to binary values. The costly floating point multiplications are then reduced to fast logical operations. High end smart phones such as Google’s Pixel 2 and Apple’s iPhone X are already equipped with specialised hardware for image processing and it is very likely that other future consumer hardware will also have dedicated accelerators for deep neural networks. Binary neural networks are attractive in this case because the logical operations are very fast and efficient when implemented in hardware. We propose a transfer learning based architecture where we first train a binary network on Imagenet and then retrain part of the network for different tasks while keeping most of the network fixed. The fixed binary part could be implemented in a hardware accelerator while the last layers of the network are evaluated in software. We show that a single binary neural network trained on the Imagenet dataset can indeed be used as a feature extractor for other datasets.
Tasks Transfer Learning
Published 2017-11-29
URL http://arxiv.org/abs/1711.10761v1
PDF http://arxiv.org/pdf/1711.10761v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-with-binary-neural-networks
Repo
Framework

Budget-Aware Activity Detection with A Recurrent Policy Network

Title Budget-Aware Activity Detection with A Recurrent Policy Network
Authors Behrooz Mahasseni, Xiaodong Yang, Pavlo Molchanov, Jan Kautz
Abstract In this paper, we address the challenging problem of efficient temporal activity detection in untrimmed long videos. While most recent work has focused and advanced the detection accuracy, the inference time can take seconds to minutes in processing each single video, which is too slow to be useful in real-world settings. This motivates the proposed budget-aware framework, which learns to perform activity detection by intelligently selecting a small subset of frames according to a specified time budget. We formulate this problem as a Markov decision process, and adopt a recurrent network to model the frame selection policy. We derive a recurrent policy gradient based approach to approximate the gradient of the non-decomposable and non-differentiable objective defined in our problem. In the extensive experiments, we achieve competitive detection accuracy, and more importantly, our approach is able to substantially reduce computation time and detect multiple activities with only 0.35s for each untrimmed long video.
Tasks Action Detection, Activity Detection
Published 2017-11-30
URL http://arxiv.org/abs/1712.00097v2
PDF http://arxiv.org/pdf/1712.00097v2.pdf
PWC https://paperswithcode.com/paper/budget-aware-activity-detection-with-a
Repo
Framework

Deformable Modeling for Human Body Acquired from Depth Sensors

Title Deformable Modeling for Human Body Acquired from Depth Sensors
Authors Vamshhi Pavan Kumar Varma Vegeshna
Abstract This paper presents a novel approach to reconstruct complete 3D deformable models over time by a single depth camera. These are the steps employed for deforming objects from single depth camera. The partial surfaces reconstructed from various times of capture are assembled together to form a complete 3D surface. A mesh warping algorithm is used to align different partial surfaces based on linear mesh deformation. A volumetric method is then applied to combine partial surfaces, fix missing holes and smooth alignment errors.
Tasks
Published 2017-08-17
URL http://arxiv.org/abs/1708.05401v2
PDF http://arxiv.org/pdf/1708.05401v2.pdf
PWC https://paperswithcode.com/paper/deformable-modeling-for-human-body-acquired
Repo
Framework

Reading Scene Text with Attention Convolutional Sequence Modeling

Title Reading Scene Text with Attention Convolutional Sequence Modeling
Authors Yunze Gao, Yingying Chen, Jinqiao Wang, Hanqing Lu
Abstract Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN, we adopt the stacked convolutional layers to effectively capture the contextual dependencies of the input sequence, which is characterized by lower computational complexity and easier parallel computation. Compared to the chain structure of recurrent networks, the Convolutional Neural Network (CNN) provides a natural way to capture long-term dependencies between elements, which is 9 times faster than Bidirectional Long Short-Term Memory (BLSTM). Furthermore, in order to enhance the representation of foreground text and suppress the background noise, we incorporate the residual attention modules into a small densely connected network to improve the discriminability of CNN features. We validate the performance of our approach on the standard benchmarks, including the Street View Text, IIIT5K and ICDAR datasets. As a result, state-of-the-art or highly-competitive performance and efficiency show the superiority of the proposed approach.
Tasks Scene Text Recognition
Published 2017-09-13
URL http://arxiv.org/abs/1709.04303v1
PDF http://arxiv.org/pdf/1709.04303v1.pdf
PWC https://paperswithcode.com/paper/reading-scene-text-with-attention
Repo
Framework

Predicting Target Language CCG Supertags Improves Neural Machine Translation

Title Predicting Target Language CCG Supertags Improves Neural Machine Translation
Authors Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch
Abstract Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask training? We introduce syntactic information in the form of CCG supertags in the decoder, by interleaving the target supertags with the word sequence. Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment. Furthermore, a tight coupling of words and syntax improves translation quality more than multitask training. By combining target-syntax with adding source-side dependency labels in the embedding layer, we obtain a total improvement of 0.9 BLEU for German->English and 1.2 BLEU for Romanian->English.
Tasks Machine Translation, Prepositional Phrase Attachment
Published 2017-02-03
URL http://arxiv.org/abs/1702.01147v2
PDF http://arxiv.org/pdf/1702.01147v2.pdf
PWC https://paperswithcode.com/paper/predicting-target-language-ccg-supertags
Repo
Framework

A convergence framework for inexact nonconvex and nonsmooth algorithms and its applications to several iterations

Title A convergence framework for inexact nonconvex and nonsmooth algorithms and its applications to several iterations
Authors Tao Sun, Hao Jiang, Lizhi Cheng, Wei Zhu
Abstract In this paper, we consider the convergence of an abstract inexact nonconvex and nonsmooth algorithm. We promise a pseudo sufficient descent condition and a pseudo relative error condition, which are both related to an auxiliary sequence, for the algorithm; and a continuity condition is assumed to hold. In fact, a lot of classical inexact nonconvex and nonsmooth algorithms allow these three conditions. Under a special kind of summable assumption on the auxiliary sequence, we prove the sequence generated by the general algorithm converges to a critical point of the objective function if being assumed Kurdyka- Lojasiewicz property. The core of the proofs lies in building a new Lyapunov function, whose successive difference provides a bound for the successive difference of the points generated by the algorithm. And then, we apply our findings to several classical nonconvex iterative algorithms and derive the corresponding convergence results
Tasks
Published 2017-09-12
URL http://arxiv.org/abs/1709.04072v6
PDF http://arxiv.org/pdf/1709.04072v6.pdf
PWC https://paperswithcode.com/paper/a-convergence-framework-for-inexact-nonconvex
Repo
Framework

Scene Text Recognition with Sliding Convolutional Character Models

Title Scene Text Recognition with Sliding Convolutional Character Models
Authors Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, Cheng-Lin Liu
Abstract Scene text recognition has attracted great interests from the computer vision and pattern recognition community in recent years. State-of-the-art methods use concolutional neural networks (CNNs), recurrent neural networks with long short-term memory (RNN-LSTM) or the combination of them. In this paper, we investigate the intrinsic characteristics of text recognition, and inspired by human cognition mechanisms in reading texts, we propose a scene text recognition method with character models on convolutional feature map. The method simultaneously detects and recognizes characters by sliding the text line image with character models, which are learned end-to-end on text line images labeled with text transcripts. The character classifier outputs on the sliding windows are normalized and decoded with Connectionist Temporal Classification (CTC) based algorithm. Compared to previous methods, our method has a number of appealing properties: (1) It avoids the difficulty of character segmentation which hinders the performance of segmentation-based recognition methods; (2) The model can be trained simply and efficiently because it avoids gradient vanishing/exploding in training RNN-LSTM based models; (3) It bases on character models trained free of lexicon, and can recognize unknown words. (4) The recognition process is highly parallel and enables fast recognition. Our experiments on several challenging English and Chinese benchmarks, including the IIIT-5K, SVT, ICDAR03/13 and TRW15 datasets, demonstrate that the proposed method yields superior or comparable performance to state-of-the-art methods while the model size is relatively small.
Tasks Scene Text Recognition
Published 2017-09-06
URL http://arxiv.org/abs/1709.01727v1
PDF http://arxiv.org/pdf/1709.01727v1.pdf
PWC https://paperswithcode.com/paper/scene-text-recognition-with-sliding
Repo
Framework

Large Scale Variable Fidelity Surrogate Modeling

Title Large Scale Variable Fidelity Surrogate Modeling
Authors Evgeny Burnaev, Alexey Zaytsev
Abstract Engineers widely use Gaussian process regression framework to construct surrogate models aimed to replace computationally expensive physical models while exploring design space. Thanks to Gaussian process properties we can use both samples generated by a high fidelity function (an expensive and accurate representation of a physical phenomenon) and a low fidelity function (a cheap and coarse approximation of the same physical phenomenon) while constructing a surrogate model. However, if samples sizes are more than few thousands of points, computational costs of the Gaussian process regression become prohibitive both in case of learning and in case of prediction calculation. We propose two approaches to circumvent this computational burden: one approach is based on the Nystr"om approximation of sample covariance matrices and another is based on an intelligent usage of a blackbox that can evaluate a~low fidelity function on the fly at any point of a design space. We examine performance of the proposed approaches using a number of artificial and real problems, including engineering optimization of a rotating disk shape.
Tasks
Published 2017-07-12
URL http://arxiv.org/abs/1707.03916v1
PDF http://arxiv.org/pdf/1707.03916v1.pdf
PWC https://paperswithcode.com/paper/large-scale-variable-fidelity-surrogate
Repo
Framework

A Shapelet Transform for Multivariate Time Series Classification

Title A Shapelet Transform for Multivariate Time Series Classification
Authors Aaron Bostrom, Anthony Bagnall
Abstract Shapelets are phase independent subsequences designed for time series classification. We propose three adaptations to the Shapelet Transform (ST) to capture multivariate features in multivariate time series classification. We create a unified set of data to benchmark our work on, and compare with three other algorithms. We demonstrate that multivariate shapelets are not significantly worse than other state-of-the-art algorithms.
Tasks Time Series, Time Series Classification
Published 2017-12-18
URL http://arxiv.org/abs/1712.06428v1
PDF http://arxiv.org/pdf/1712.06428v1.pdf
PWC https://paperswithcode.com/paper/a-shapelet-transform-for-multivariate-time
Repo
Framework

Optimal Weighting for Exam Composition

Title Optimal Weighting for Exam Composition
Authors Sam Ganzfried, Farzana Yusuf
Abstract A problem faced by many instructors is that of designing exams that accurately assess the abilities of the students. Typically these exams are prepared several days in advance, and generic question scores are used based on rough approximation of the question difficulty and length. For example, for a recent class taught by the author, there were 30 multiple choice questions worth 3 points, 15 true/false with explanation questions worth 4 points, and 5 analytical exercises worth 10 points. We describe a novel framework where algorithms from machine learning are used to modify the exam question weights in order to optimize the exam scores, using the overall class grade as a proxy for a student’s true ability. We show that significant error reduction can be obtained by our approach over standard weighting schemes, and we make several new observations regarding the properties of the “good” and “bad” exam questions that can have impact on the design of improved future evaluation methods.
Tasks
Published 2017-12-24
URL http://arxiv.org/abs/1801.06043v1
PDF http://arxiv.org/pdf/1801.06043v1.pdf
PWC https://paperswithcode.com/paper/optimal-weighting-for-exam-composition
Repo
Framework

OSTSC: Over Sampling for Time Series Classification in R

Title OSTSC: Over Sampling for Time Series Classification in R
Authors Matthew Dixon, Diego Klabjan, Lan Wei
Abstract The OSTSC package is a powerful oversampling approach for classifying univariant, but multinomial time series data in R. This article provides a brief overview of the oversampling methodology implemented by the package. A tutorial of the OSTSC package is provided. We begin by providing three test cases for the user to quickly validate the functionality in the package. To demonstrate the performance impact of OSTSC, we then provide two medium size imbalanced time series datasets. Each example applies a TensorFlow implementation of a Long Short-Term Memory (LSTM) classifier - a type of a Recurrent Neural Network (RNN) classifier - to imbalanced time series. The classifier performance is compared with and without oversampling. Finally, larger versions of these two datasets are evaluated to demonstrate the scalability of the package. The examples demonstrate that the OSTSC package improves the performance of RNN classifiers applied to highly imbalanced time series data. In particular, OSTSC is observed to increase the AUC of LSTM from 0.543 to 0.784 on a high frequency trading dataset consisting of 30,000 time series observations.
Tasks Time Series, Time Series Classification
Published 2017-11-27
URL http://arxiv.org/abs/1711.09545v1
PDF http://arxiv.org/pdf/1711.09545v1.pdf
PWC https://paperswithcode.com/paper/ostsc-over-sampling-for-time-series
Repo
Framework

Classification of Time-Series Images Using Deep Convolutional Neural Networks

Title Classification of Time-Series Images Using Deep Convolutional Neural Networks
Authors Nima Hatami, Yann Gavet, Johan Debayle
Abstract Convolutional Neural Networks (CNN) has achieved a great success in image recognition task by automatically learning a hierarchical feature representation from raw data. While the majority of Time-Series Classification (TSC) literature is focused on 1D signals, this paper uses Recurrence Plots (RP) to transform time-series into 2D texture images and then take advantage of the deep CNN classifier. Image representation of time-series introduces different feature types that are not available for 1D signals, and therefore TSC can be treated as texture image recognition task. CNN model also allows learning different levels of representations together with a classifier, jointly and automatically. Therefore, using RP and CNN in a unified framework is expected to boost the recognition rate of TSC. Experimental results on the UCR time-series classification archive demonstrate competitive accuracy of the proposed approach, compared not only to the existing deep architectures, but also to the state-of-the art TSC algorithms.
Tasks Time Series, Time Series Classification
Published 2017-10-02
URL http://arxiv.org/abs/1710.00886v2
PDF http://arxiv.org/pdf/1710.00886v2.pdf
PWC https://paperswithcode.com/paper/classification-of-time-series-images-using
Repo
Framework

Protest Activity Detection and Perceived Violence Estimation from Social Media Images

Title Protest Activity Detection and Perceived Violence Estimation from Social Media Images
Authors Donghyeon Won, Zachary C. Steinert-Threlkeld, Jungseock Joo
Abstract We develop a novel visual model which can recognize protesters, describe their activities by visual attributes and estimate the level of perceived violence in an image. Studies of social media and protests use natural language processing to track how individuals use hashtags and links, often with a focus on those items’ diffusion. These approaches, however, may not be effective in fully characterizing actual real-world protests (e.g., violent or peaceful) or estimating the demographics of participants (e.g., age, gender, and race) and their emotions. Our system characterizes protests along these dimensions. We have collected geotagged tweets and their images from 2013-2017 and analyzed multiple major protest events in that period. A multi-task convolutional neural network is employed in order to automatically classify the presence of protesters in an image and predict its visual attributes, perceived violence and exhibited emotions. We also release the UCLA Protest Image Dataset, our novel dataset of 40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments. Using this dataset, we train our model and demonstrate its effectiveness. We also present experimental results from various analysis on geotagged image data in several prevalent protest events. Our dataset will be made accessible at https://www.sscnet.ucla.edu/comm/jjoo/mm-protest/.
Tasks Action Detection, Activity Detection
Published 2017-09-18
URL http://arxiv.org/abs/1709.06204v1
PDF http://arxiv.org/pdf/1709.06204v1.pdf
PWC https://paperswithcode.com/paper/protest-activity-detection-and-perceived
Repo
Framework

Zero-Shot Learning via Class-Conditioned Deep Generative Models

Title Zero-Shot Learning via Class-Conditioned Deep Generative Models
Authors Wenlin Wang, Yunchen Pu, Vinay Kumar Verma, Kai Fan, Yizhe Zhang, Changyou Chen, Piyush Rai, Lawrence Carin
Abstract We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.
Tasks Few-Shot Learning, Zero-Shot Learning
Published 2017-11-15
URL http://arxiv.org/abs/1711.05820v2
PDF http://arxiv.org/pdf/1711.05820v2.pdf
PWC https://paperswithcode.com/paper/zero-shot-learning-via-class-conditioned-deep
Repo
Framework

A Nonparametric Model for Multimodal Collaborative Activities Summarization

Title A Nonparametric Model for Multimodal Collaborative Activities Summarization
Authors Guy Rosman, John W. Fisher III, Daniela Rus
Abstract Ego-centric data streams provide a unique opportunity to reason about joint behavior by pooling data across individuals. This is especially evident in urban environments teeming with human activities, but which suffer from incomplete and noisy data. Collaborative human activities exhibit common spatial, temporal, and visual characteristics facilitating inference across individuals from multiple sensory modalities as we explore in this paper from the perspective of meetings. We propose a new Bayesian nonparametric model that enables us to efficiently pool video and GPS data towards collaborative activities analysis from multiple individuals. We demonstrate the utility of this model for inference tasks such as activity detection, classification, and summarization. We further demonstrate how spatio-temporal structure embedded in our model enables better understanding of partial and noisy observations such as localization and face detections based on social interactions. We show results on both synthetic experiments and a new dataset of egocentric video and noisy GPS data from multiple individuals.
Tasks Action Detection, Activity Detection
Published 2017-09-04
URL http://arxiv.org/abs/1709.01077v1
PDF http://arxiv.org/pdf/1709.01077v1.pdf
PWC https://paperswithcode.com/paper/a-nonparametric-model-for-multimodal
Repo
Framework
comments powered by Disqus