July 29, 2019

3361 words 16 mins read

Paper Group ANR 121

End-to-end Face Detection and Cast Grouping in Movies Using Erdős-Rényi Clustering. A textual transform of multivariate time-series for prognostics. Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning. Robust Computer Algebra, Theorem Proving, and Oracle AI. Modeling of the Latent Embedding of Music using Deep Neural N …

End-to-end Face Detection and Cast Grouping in Movies Using Erdős-Rényi Clustering


Title	End-to-end Face Detection and Cast Grouping in Movies Using Erdős-Rényi Clustering
Authors	SouYoung Jin, Hang Su, Chris Stauffer, Erik Learned-Miller
Abstract	We present an end-to-end system for detecting and clustering faces by identity in full-length movies. Unlike works that start with a predefined set of detected faces, we consider the end-to-end problem of detection and clustering together. We make three separate contributions. First, we combine a state-of-the-art face detector with a generic tracker to extract high quality face tracklets. We then introduce a novel clustering method, motivated by the classic graph theory results of Erd\H{o}s and R'enyi. It is based on the observations that large clusters can be fully connected by joining just a small fraction of their point pairs, while just a single connection between two different people can lead to poor clustering results. This suggests clustering using a verification system with very few false positives but perhaps moderate recall. We introduce a novel verification method, rank-1 counts verification, that has this property, and use it in a link-based clustering scheme. Finally, we define a novel end-to-end detection and clustering evaluation metric allowing us to assess the accuracy of the entire end-to-end system. We present state-of-the-art results on multiple video data sets and also on standard face databases.
Tasks	Face Detection
Published	2017-09-07
URL	http://arxiv.org/abs/1709.02458v1
PDF	http://arxiv.org/pdf/1709.02458v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-face-detection-and-cast-grouping
Repo
Framework

A textual transform of multivariate time-series for prognostics


Title	A textual transform of multivariate time-series for prognostics
Authors	Abhay Harpale, Abhishek Srivastav
Abstract	Prognostics or early detection of incipient faults is an important industrial challenge for condition-based and preventive maintenance. Physics-based approaches to modeling fault progression are infeasible due to multiple interacting components, uncontrolled environmental factors and observability constraints. Moreover, such approaches to prognostics do not generalize to new domains. Consequently, domain-agnostic data-driven machine learning approaches to prognostics are desirable. Damage progression is a path-dependent process and explicitly modeling the temporal patterns is critical for accurate estimation of both the current damage state and its progression leading to total failure. In this paper, we present a novel data-driven approach to prognostics that employs a novel textual representation of multivariate temporal sensor observations for predicting the future health state of the monitored equipment early in its life. This representation enables us to utilize well-understood concepts from text-mining for modeling, prediction and understanding distress patterns in a domain agnostic way. The approach has been deployed and successfully tested on large scale multivariate time-series data from commercial aircraft engines. We report experiments on well-known publicly available benchmark datasets and simulation datasets. The proposed approach is shown to be superior in terms of prediction accuracy, lead time to prediction and interpretability.
Tasks	Time Series
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06669v1
PDF	http://arxiv.org/pdf/1709.06669v1.pdf
PWC	https://paperswithcode.com/paper/a-textual-transform-of-multivariate-time
Repo
Framework

Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning


Title	Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning
Authors	Liang Xiao, Guoan Han, Donghua Jiang, Hongzi Zhu, Yanyong Zhang, H. Vincent Poor
Abstract	By using smart radio devices, a jammer can dynamically change its jamming policy based on opposing security mechanisms; it can even induce the mobile device to enter a specific communication mode and then launch the jamming policy accordingly. On the other hand, mobile devices can exploit spread spectrum and user mobility to address both jamming and interference. In this paper, a two-dimensional anti-jamming mobile communication scheme is proposed in which a mobile device leaves a heavily jammed/interfered-with frequency or area. It is shown that, by applying reinforcement learning techniques, a mobile device can achieve an optimal communication policy without the need to know the jamming and interference model and the radio channel model in a dynamic game framework. More specifically, a hotbooting deep Q-network based two-dimensional mobile communication scheme is proposed that exploits experiences in similar scenarios to reduce the exploration time at the beginning of the game, and applies deep convolutional neural network and macro-action techniques to accelerate the learning speed in dynamic situations. Several real-world scenarios are simulated to evaluate the proposed method. These simulation results show that our proposed scheme can improve both the signal-to-interference-plus-noise ratio of the signals and the utility of the mobile devices against cooperative jamming compared with benchmark schemes.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06793v1
PDF	http://arxiv.org/pdf/1712.06793v1.pdf
PWC	https://paperswithcode.com/paper/two-dimensional-anti-jamming-mobile
Repo
Framework

Robust Computer Algebra, Theorem Proving, and Oracle AI


Title	Robust Computer Algebra, Theorem Proving, and Oracle AI
Authors	Gopal P. Sarma, Nick J. Hay
Abstract	In the context of superintelligent AI systems, the term “oracle” has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research problems related to oracles which align with long-term research goals of AI safety. We examine existing question answering systems and argue that their high degree of architectural heterogeneity makes them poor candidates for rigorous analysis as oracles. On the other hand, we identify computer algebra systems (CASs) as being primitive examples of domain-specific oracles for mathematics and argue that efforts to integrate computer algebra systems with theorem provers, systems which have largely been developed independent of one another, provide a concrete set of problems related to the notion of provable safety that has emerged in the AI safety community. We review approaches to interfacing CASs with theorem provers, describe well-defined architectural deficiencies that have been identified with CASs, and suggest possible lines of research and practical software projects for scientists interested in AI safety.
Tasks	Automated Theorem Proving, Question Answering
Published	2017-08-08
URL	http://arxiv.org/abs/1708.02553v2
PDF	http://arxiv.org/pdf/1708.02553v2.pdf
PWC	https://paperswithcode.com/paper/robust-computer-algebra-theorem-proving-and
Repo
Framework

Modeling of the Latent Embedding of Music using Deep Neural Network


Title	Modeling of the Latent Embedding of Music using Deep Neural Network
Authors	Zhou Xing, Eddy Baik, Yan Jiao, Nilesh Kulkarni, Chris Li, Gautam Muralidhar, Marzieh Parandehgheibi, Erik Reed, Abhishek Singhal, Fei Xiao, Chris Pouliot
Abstract	While both the data volume and heterogeneity of the digital music content is huge, it has become increasingly important and convenient to build a recommendation or search system to facilitate surfacing these content to the user or consumer community. Most of the recommendation models fall into two primary species, collaborative filtering based and content based approaches. Variants of instantiations of collaborative filtering approach suffer from the common issues of so called “cold start” and “long tail” problems where there is not much user interaction data to reveal user opinions or affinities on the content and also the distortion towards the popular content. Content-based approaches are sometimes limited by the richness of the available content data resulting in a heavily biased and coarse recommendation result. In recent years, the deep neural network has enjoyed a great success in large-scale image and video recognitions. In this paper, we propose and experiment using deep convolutional neural network to imitate how human brain processes hierarchical structures in the auditory signals, such as music, speech, etc., at various timescales. This approach can be used to discover the latent factor models of the music based upon acoustic hyper-images that are extracted from the raw audio waves of music. These latent embeddings can be used either as features to feed to subsequent models, such as collaborative filtering, or to build similarity metrics between songs, or to classify music based on the labels for training such as genre, mood, sentiment, etc.
Tasks
Published	2017-05-12
URL	http://arxiv.org/abs/1705.05229v1
PDF	http://arxiv.org/pdf/1705.05229v1.pdf
PWC	https://paperswithcode.com/paper/modeling-of-the-latent-embedding-of-music
Repo
Framework

The Error Probability of Random Fourier Features is Dimensionality Independent


Title	The Error Probability of Random Fourier Features is Dimensionality Independent
Authors	Jean Honorio, Yu-Jun Li
Abstract	We show that the error probability of reconstructing kernel matrices from Random Fourier Features for the Gaussian kernel function is at most $\mathcal{O}(R^{2/3} \exp(-D))$, where $D$ is the number of random features and $R$ is the diameter of the data domain. We also provide an information-theoretic method-independent lower bound of $\Omega((1-\exp(-R^2)) \exp(-D))$. Compared to prior work, we are the first to show that the error probability for random Fourier features is independent of the dimensionality of data points. As applications of our theory, we obtain dimension-independent bounds for kernel ridge regression and support vector machines.
Tasks
Published	2017-10-27
URL	http://arxiv.org/abs/1710.09953v4
PDF	http://arxiv.org/pdf/1710.09953v4.pdf
PWC	https://paperswithcode.com/paper/the-error-probability-of-random-fourier
Repo
Framework

A General Framework for the Recognition of Online Handwritten Graphics


Title	A General Framework for the Recognition of Online Handwritten Graphics
Authors	Frank Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, Nina S. T. Hirata
Abstract	We propose a new framework for the recognition of online handwritten graphics. Three main features of the framework are its ability to treat symbol and structural level information in an integrated way, its flexibility with respect to different families of graphics, and means to control the tradeoff between recognition effectiveness and computational cost. We model a graphic as a labeled graph generated from a graph grammar. Non-terminal vertices represent subcomponents, terminal vertices represent symbols, and edges represent relations between subcomponents or symbols. We then model the recognition problem as a graph parsing problem: given an input stroke set, we search for a parse tree that represents the best interpretation of the input. Our graph parsing algorithm generates multiple interpretations (consistent with the grammar) and then we extract an optimal interpretation according to a cost function that takes into consideration the likelihood scores of symbols and structures. The parsing algorithm consists in recursively partitioning the stroke set according to structures defined in the grammar and it does not impose constraints present in some previous works (e.g. stroke ordering). By avoiding such constraints and thanks to the powerful representativeness of graphs, our approach can be adapted to the recognition of different graphic notations. We show applications to the recognition of mathematical expressions and flowcharts. Experimentation shows that our method obtains state-of-the-art accuracy in both applications.
Tasks
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06389v1
PDF	http://arxiv.org/pdf/1709.06389v1.pdf
PWC	https://paperswithcode.com/paper/a-general-framework-for-the-recognition-of
Repo
Framework

Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation


Title	Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation
Authors	Ryota Hinami, Shin’ichi Satoh
Abstract	Thanks to the success of object detection technology, we can retrieve objects of the specified classes even from huge image collections. However, the current state-of-the-art object detectors (such as Faster R-CNN) can only handle pre-specified classes. In addition, large amounts of positive and negative visual samples are required for training. In this paper, we address the problem of open-vocabulary object retrieval and localization, where the target object is specified by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple extension of Faster R-CNN adapted to open-vocabulary queries, by transforming the text embedding vector into an object classifier and localization regressor. Then, for discriminative training, we then propose negative phrase augmentation (NPA) to mine hard negative samples which are visually similar to the query and at the same time semantically mutually exclusive of the query. The proposed method can retrieve and localize objects specified by a textual query from one million images in only 0.5 seconds with high precision.
Tasks	Object Detection
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09509v2
PDF	http://arxiv.org/pdf/1711.09509v2.pdf
PWC	https://paperswithcode.com/paper/discriminative-learning-of-open-vocabulary
Repo
Framework

Studying Positive Speech on Twitter


Title	Studying Positive Speech on Twitter
Authors	Marina Sokolova, Vera Sazonova, Kanyi Huang, Rudraneel Chakraboty, Stan Matwin
Abstract	We present results of empirical studies on positive speech on Twitter. By positive speech we understand speech that works for the betterment of a given situation, in this case relations between different communities in a conflict-prone country. We worked with four Twitter data sets. Through semi-manual opinion mining, we found that positive speech accounted for < 1% of the data . In fully automated studies, we tested two approaches: unsupervised statistical analysis, and supervised text classification based on distributed word representation. We discuss benefits and challenges of those approaches and report empirical evidence obtained in the study.
Tasks	Opinion Mining, Text Classification
Published	2017-02-24
URL	http://arxiv.org/abs/1702.08866v1
PDF	http://arxiv.org/pdf/1702.08866v1.pdf
PWC	https://paperswithcode.com/paper/studying-positive-speech-on-twitter
Repo
Framework

Visual attention models for scene text recognition


Title	Visual attention models for scene text recognition
Authors	Suman K. Ghosh, Ernest Valveny, Andrew D. Bagdanov
Abstract	In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR’03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.
Tasks	Language Modelling, Scene Text Recognition
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01487v1
PDF	http://arxiv.org/pdf/1706.01487v1.pdf
PWC	https://paperswithcode.com/paper/visual-attention-models-for-scene-text
Repo
Framework

Hash Embeddings for Efficient Word Representations


Title	Hash Embeddings for Efficient Word Representations
Authors	Dan Svenstrup, Jonas Meinertz Hansen, Ole Winther
Abstract	We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by $k$ $d$-dimensional embeddings vectors and one $k$ dimensional weight vector. The final $d$ dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of $B$ embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions of tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore, the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding, hash embeddings can be considered an extension and improvement over the existing regular embedding types.
Tasks
Published	2017-09-12
URL	http://arxiv.org/abs/1709.03933v1
PDF	http://arxiv.org/pdf/1709.03933v1.pdf
PWC	https://paperswithcode.com/paper/hash-embeddings-for-efficient-word
Repo
Framework

Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery


Title	Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery
Authors	Thomas Probst, Kevis-Kokitsi Maninis, Ajad Chhatkuli, Mouloud Ourak, Emmanuel Vander Poorten, Luc Van Gool
Abstract	Computer vision and robotics are being increasingly applied in medical interventions. Especially in interventions where extreme precision is required they could make a difference. One such application is robot-assisted retinal microsurgery. In recent works, such interventions are conducted under a stereo-microscope, and with a robot-controlled surgical tool. The complementarity of computer vision and robotics has however not yet been fully exploited. In order to improve the robot control we are interested in 3D reconstruction of the anatomy and in automatic tool localization using a stereo microscope. In this paper, we solve this problem for the first time using a single pipeline, starting from uncalibrated cameras to reach metric 3D reconstruction and registration, in retinal microsurgery. The key ingredients of our method are: (a) surgical tool landmark detection, and (b) 3D reconstruction with the stereo microscope, using the detected landmarks. To address the former, we propose a novel deep learning method that detects and recognizes keypoints in high definition images at higher than real-time speed. We use the detected 2D keypoints along with their corresponding 3D coordinates obtained from the robot sensors to calibrate the stereo microscope using an affine projection model. We design an online 3D reconstruction pipeline that makes use of smoothness constraints and performs robot-to-camera registration. The entire pipeline is extensively validated on open-sky porcine eye sequences. Quantitative and qualitative results are presented for all steps.
Tasks	3D Reconstruction
Published	2017-09-17
URL	http://arxiv.org/abs/1709.05665v2
PDF	http://arxiv.org/pdf/1709.05665v2.pdf
PWC	https://paperswithcode.com/paper/automatic-tool-landmark-detection-for-stereo
Repo
Framework

Deep Learning based Isolated Arabic Scene Character Recognition


Title	Deep Learning based Isolated Arabic Scene Character Recognition
Authors	Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Rubiyah Yousaf
Abstract	The technological advancement and sophistication in cameras and gadgets prompt researchers to have focus on image analysis and text understanding. The deep learning techniques demonstrated well to assess the potential for classifying text from natural scene images as reported in recent years. There are variety of deep learning approaches that prospects the detection and recognition of text, effectively from images. In this work, we presented Arabic scene text recognition using Convolutional Neural Networks (ConvNets) as a deep learning classifier. As the scene text data is slanted and skewed, thus to deal with maximum variations, we employ five orientations with respect to single occurrence of a character. The training is formulated by keeping filter size 3 x 3 and 5 x 5 with stride value as 1 and 2. During text classification phase, we trained network with distinct learning rates. Our approach reported encouraging results on recognition of Arabic characters from segmented Arabic scene images.
Tasks	Scene Text Recognition, Text Classification
Published	2017-04-22
URL	http://arxiv.org/abs/1704.06821v1
PDF	http://arxiv.org/pdf/1704.06821v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-isolated-arabic-scene
Repo
Framework

Evolutionary learning of fire fighting strategies


Title	Evolutionary learning of fire fighting strategies
Authors	Martin Kretschmer, Elmar Langetepe
Abstract	The dynamic problem of enclosing an expanding fire can be modelled by a discrete variant in a grid graph. While the fire expands to all neighbouring cells in any time step, the fire fighter is allowed to block $c$ cells in the average outside the fire in the same time interval. It was shown that the success of the fire fighter is guaranteed for $c>1.5$ but no strategy can enclose the fire for $c\leq 1.5$. For achieving such a critical threshold the correctness (sometimes even optimality) of strategies and lower bounds have been shown by integer programming or by direct but often very sophisticated arguments. We investigate the problem whether it is possible to find or to approach such a threshold and/or optimal strategies by means of evolutionary algorithms, i.e., we just try to learn successful strategies for different constants $c$ and have a look at the outcome. The main general idea is that this approach might give some insight in the power of evolutionary strategies for similar geometrically motivated threshold questions. We investigate the variant of protecting a highway with still unknown threshold and found interesting strategic paradigms. Keywords: Dynamic environments, fire fighting, evolutionary strategies, threshold approximation
Tasks
Published	2017-05-04
URL	http://arxiv.org/abs/1705.01721v1
PDF	http://arxiv.org/pdf/1705.01721v1.pdf
PWC	https://paperswithcode.com/paper/evolutionary-learning-of-fire-fighting
Repo
Framework

A Self-Training Method for Semi-Supervised GANs


Title	A Self-Training Method for Semi-Supervised GANs
Authors	Alan Do-Omri, Dalei Wu, Xiaohua Liu
Abstract	Since the creation of Generative Adversarial Networks (GANs), much work has been done to improve their training stability, their generated image quality, their range of application but nearly none of them explored their self-training potential. Self-training has been used before the advent of deep learning in order to allow training on limited labelled training data and has shown impressive results in semi-supervised learning. In this work, we combine these two ideas and make GANs self-trainable for semi-supervised learning tasks by exploiting their infinite data generation potential. Results show that using even the simplest form of self-training yields an improvement. We also show results for a more complex self-training scheme that performs at least as well as the basic self-training scheme but with significantly less data augmentation.
Tasks	Data Augmentation
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10313v1
PDF	http://arxiv.org/pdf/1710.10313v1.pdf
PWC	https://paperswithcode.com/paper/a-self-training-method-for-semi-supervised
Repo
Framework