Paper Group ANR 121
End-to-end Face Detection and Cast Grouping in Movies Using Erdős-Rényi Clustering. A textual transform of multivariate time-series for prognostics. Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning. Robust Computer Algebra, Theorem Proving, and Oracle AI. Modeling of the Latent Embedding of Music using Deep Neural N …
End-to-end Face Detection and Cast Grouping in Movies Using Erdős-Rényi Clustering
Title | End-to-end Face Detection and Cast Grouping in Movies Using Erdős-Rényi Clustering |
Authors | SouYoung Jin, Hang Su, Chris Stauffer, Erik Learned-Miller |
Abstract | We present an end-to-end system for detecting and clustering faces by identity in full-length movies. Unlike works that start with a predefined set of detected faces, we consider the end-to-end problem of detection and clustering together. We make three separate contributions. First, we combine a state-of-the-art face detector with a generic tracker to extract high quality face tracklets. We then introduce a novel clustering method, motivated by the classic graph theory results of Erd\H{o}s and R'enyi. It is based on the observations that large clusters can be fully connected by joining just a small fraction of their point pairs, while just a single connection between two different people can lead to poor clustering results. This suggests clustering using a verification system with very few false positives but perhaps moderate recall. We introduce a novel verification method, rank-1 counts verification, that has this property, and use it in a link-based clustering scheme. Finally, we define a novel end-to-end detection and clustering evaluation metric allowing us to assess the accuracy of the entire end-to-end system. We present state-of-the-art results on multiple video data sets and also on standard face databases. |
Tasks | Face Detection |
Published | 2017-09-07 |
URL | http://arxiv.org/abs/1709.02458v1 |
http://arxiv.org/pdf/1709.02458v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-face-detection-and-cast-grouping |
Repo | |
Framework | |
A textual transform of multivariate time-series for prognostics
Title | A textual transform of multivariate time-series for prognostics |
Authors | Abhay Harpale, Abhishek Srivastav |
Abstract | Prognostics or early detection of incipient faults is an important industrial challenge for condition-based and preventive maintenance. Physics-based approaches to modeling fault progression are infeasible due to multiple interacting components, uncontrolled environmental factors and observability constraints. Moreover, such approaches to prognostics do not generalize to new domains. Consequently, domain-agnostic data-driven machine learning approaches to prognostics are desirable. Damage progression is a path-dependent process and explicitly modeling the temporal patterns is critical for accurate estimation of both the current damage state and its progression leading to total failure. In this paper, we present a novel data-driven approach to prognostics that employs a novel textual representation of multivariate temporal sensor observations for predicting the future health state of the monitored equipment early in its life. This representation enables us to utilize well-understood concepts from text-mining for modeling, prediction and understanding distress patterns in a domain agnostic way. The approach has been deployed and successfully tested on large scale multivariate time-series data from commercial aircraft engines. We report experiments on well-known publicly available benchmark datasets and simulation datasets. The proposed approach is shown to be superior in terms of prediction accuracy, lead time to prediction and interpretability. |
Tasks | Time Series |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06669v1 |
http://arxiv.org/pdf/1709.06669v1.pdf | |
PWC | https://paperswithcode.com/paper/a-textual-transform-of-multivariate-time |
Repo | |
Framework | |
Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning
Title | Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning |
Authors | Liang Xiao, Guoan Han, Donghua Jiang, Hongzi Zhu, Yanyong Zhang, H. Vincent Poor |
Abstract | By using smart radio devices, a jammer can dynamically change its jamming policy based on opposing security mechanisms; it can even induce the mobile device to enter a specific communication mode and then launch the jamming policy accordingly. On the other hand, mobile devices can exploit spread spectrum and user mobility to address both jamming and interference. In this paper, a two-dimensional anti-jamming mobile communication scheme is proposed in which a mobile device leaves a heavily jammed/interfered-with frequency or area. It is shown that, by applying reinforcement learning techniques, a mobile device can achieve an optimal communication policy without the need to know the jamming and interference model and the radio channel model in a dynamic game framework. More specifically, a hotbooting deep Q-network based two-dimensional mobile communication scheme is proposed that exploits experiences in similar scenarios to reduce the exploration time at the beginning of the game, and applies deep convolutional neural network and macro-action techniques to accelerate the learning speed in dynamic situations. Several real-world scenarios are simulated to evaluate the proposed method. These simulation results show that our proposed scheme can improve both the signal-to-interference-plus-noise ratio of the signals and the utility of the mobile devices against cooperative jamming compared with benchmark schemes. |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.06793v1 |
http://arxiv.org/pdf/1712.06793v1.pdf | |
PWC | https://paperswithcode.com/paper/two-dimensional-anti-jamming-mobile |
Repo | |
Framework | |
Robust Computer Algebra, Theorem Proving, and Oracle AI
Title | Robust Computer Algebra, Theorem Proving, and Oracle AI |
Authors | Gopal P. Sarma, Nick J. Hay |
Abstract | In the context of superintelligent AI systems, the term “oracle” has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research problems related to oracles which align with long-term research goals of AI safety. We examine existing question answering systems and argue that their high degree of architectural heterogeneity makes them poor candidates for rigorous analysis as oracles. On the other hand, we identify computer algebra systems (CASs) as being primitive examples of domain-specific oracles for mathematics and argue that efforts to integrate computer algebra systems with theorem provers, systems which have largely been developed independent of one another, provide a concrete set of problems related to the notion of provable safety that has emerged in the AI safety community. We review approaches to interfacing CASs with theorem provers, describe well-defined architectural deficiencies that have been identified with CASs, and suggest possible lines of research and practical software projects for scientists interested in AI safety. |
Tasks | Automated Theorem Proving, Question Answering |
Published | 2017-08-08 |
URL | http://arxiv.org/abs/1708.02553v2 |
http://arxiv.org/pdf/1708.02553v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-computer-algebra-theorem-proving-and |
Repo | |
Framework | |
Modeling of the Latent Embedding of Music using Deep Neural Network
Title | Modeling of the Latent Embedding of Music using Deep Neural Network |
Authors | Zhou Xing, Eddy Baik, Yan Jiao, Nilesh Kulkarni, Chris Li, Gautam Muralidhar, Marzieh Parandehgheibi, Erik Reed, Abhishek Singhal, Fei Xiao, Chris Pouliot |
Abstract | While both the data volume and heterogeneity of the digital music content is huge, it has become increasingly important and convenient to build a recommendation or search system to facilitate surfacing these content to the user or consumer community. Most of the recommendation models fall into two primary species, collaborative filtering based and content based approaches. Variants of instantiations of collaborative filtering approach suffer from the common issues of so called “cold start” and “long tail” problems where there is not much user interaction data to reveal user opinions or affinities on the content and also the distortion towards the popular content. Content-based approaches are sometimes limited by the richness of the available content data resulting in a heavily biased and coarse recommendation result. In recent years, the deep neural network has enjoyed a great success in large-scale image and video recognitions. In this paper, we propose and experiment using deep convolutional neural network to imitate how human brain processes hierarchical structures in the auditory signals, such as music, speech, etc., at various timescales. This approach can be used to discover the latent factor models of the music based upon acoustic hyper-images that are extracted from the raw audio waves of music. These latent embeddings can be used either as features to feed to subsequent models, such as collaborative filtering, or to build similarity metrics between songs, or to classify music based on the labels for training such as genre, mood, sentiment, etc. |
Tasks | |
Published | 2017-05-12 |
URL | http://arxiv.org/abs/1705.05229v1 |
http://arxiv.org/pdf/1705.05229v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-of-the-latent-embedding-of-music |
Repo | |
Framework | |
The Error Probability of Random Fourier Features is Dimensionality Independent
Title | The Error Probability of Random Fourier Features is Dimensionality Independent |
Authors | Jean Honorio, Yu-Jun Li |
Abstract | We show that the error probability of reconstructing kernel matrices from Random Fourier Features for the Gaussian kernel function is at most $\mathcal{O}(R^{2/3} \exp(-D))$, where $D$ is the number of random features and $R$ is the diameter of the data domain. We also provide an information-theoretic method-independent lower bound of $\Omega((1-\exp(-R^2)) \exp(-D))$. Compared to prior work, we are the first to show that the error probability for random Fourier features is independent of the dimensionality of data points. As applications of our theory, we obtain dimension-independent bounds for kernel ridge regression and support vector machines. |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.09953v4 |
http://arxiv.org/pdf/1710.09953v4.pdf | |
PWC | https://paperswithcode.com/paper/the-error-probability-of-random-fourier |
Repo | |
Framework | |
A General Framework for the Recognition of Online Handwritten Graphics
Title | A General Framework for the Recognition of Online Handwritten Graphics |
Authors | Frank Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, Nina S. T. Hirata |
Abstract | We propose a new framework for the recognition of online handwritten graphics. Three main features of the framework are its ability to treat symbol and structural level information in an integrated way, its flexibility with respect to different families of graphics, and means to control the tradeoff between recognition effectiveness and computational cost. We model a graphic as a labeled graph generated from a graph grammar. Non-terminal vertices represent subcomponents, terminal vertices represent symbols, and edges represent relations between subcomponents or symbols. We then model the recognition problem as a graph parsing problem: given an input stroke set, we search for a parse tree that represents the best interpretation of the input. Our graph parsing algorithm generates multiple interpretations (consistent with the grammar) and then we extract an optimal interpretation according to a cost function that takes into consideration the likelihood scores of symbols and structures. The parsing algorithm consists in recursively partitioning the stroke set according to structures defined in the grammar and it does not impose constraints present in some previous works (e.g. stroke ordering). By avoiding such constraints and thanks to the powerful representativeness of graphs, our approach can be adapted to the recognition of different graphic notations. We show applications to the recognition of mathematical expressions and flowcharts. Experimentation shows that our method obtains state-of-the-art accuracy in both applications. |
Tasks | |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06389v1 |
http://arxiv.org/pdf/1709.06389v1.pdf | |
PWC | https://paperswithcode.com/paper/a-general-framework-for-the-recognition-of |
Repo | |
Framework | |
Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation
Title | Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation |
Authors | Ryota Hinami, Shin’ichi Satoh |
Abstract | Thanks to the success of object detection technology, we can retrieve objects of the specified classes even from huge image collections. However, the current state-of-the-art object detectors (such as Faster R-CNN) can only handle pre-specified classes. In addition, large amounts of positive and negative visual samples are required for training. In this paper, we address the problem of open-vocabulary object retrieval and localization, where the target object is specified by a textual query (e.g., a word or phrase). We first propose Query-Adaptive R-CNN, a simple extension of Faster R-CNN adapted to open-vocabulary queries, by transforming the text embedding vector into an object classifier and localization regressor. Then, for discriminative training, we then propose negative phrase augmentation (NPA) to mine hard negative samples which are visually similar to the query and at the same time semantically mutually exclusive of the query. The proposed method can retrieve and localize objects specified by a textual query from one million images in only 0.5 seconds with high precision. |
Tasks | Object Detection |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09509v2 |
http://arxiv.org/pdf/1711.09509v2.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-learning-of-open-vocabulary |
Repo | |
Framework | |
Studying Positive Speech on Twitter
Title | Studying Positive Speech on Twitter |
Authors | Marina Sokolova, Vera Sazonova, Kanyi Huang, Rudraneel Chakraboty, Stan Matwin |
Abstract | We present results of empirical studies on positive speech on Twitter. By positive speech we understand speech that works for the betterment of a given situation, in this case relations between different communities in a conflict-prone country. We worked with four Twitter data sets. Through semi-manual opinion mining, we found that positive speech accounted for < 1% of the data . In fully automated studies, we tested two approaches: unsupervised statistical analysis, and supervised text classification based on distributed word representation. We discuss benefits and challenges of those approaches and report empirical evidence obtained in the study. |
Tasks | Opinion Mining, Text Classification |
Published | 2017-02-24 |
URL | http://arxiv.org/abs/1702.08866v1 |
http://arxiv.org/pdf/1702.08866v1.pdf | |
PWC | https://paperswithcode.com/paper/studying-positive-speech-on-twitter |
Repo | |
Framework | |
Visual attention models for scene text recognition
Title | Visual attention models for scene text recognition |
Authors | Suman K. Ghosh, Ernest Valveny, Andrew D. Bagdanov |
Abstract | In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR’03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition. |
Tasks | Language Modelling, Scene Text Recognition |
Published | 2017-06-05 |
URL | http://arxiv.org/abs/1706.01487v1 |
http://arxiv.org/pdf/1706.01487v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-attention-models-for-scene-text |
Repo | |
Framework | |
Hash Embeddings for Efficient Word Representations
Title | Hash Embeddings for Efficient Word Representations |
Authors | Dan Svenstrup, Jonas Meinertz Hansen, Ole Winther |
Abstract | We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by $k$ $d$-dimensional embeddings vectors and one $k$ dimensional weight vector. The final $d$ dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of $B$ embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions of tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore, the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding, hash embeddings can be considered an extension and improvement over the existing regular embedding types. |
Tasks | |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03933v1 |
http://arxiv.org/pdf/1709.03933v1.pdf | |
PWC | https://paperswithcode.com/paper/hash-embeddings-for-efficient-word |
Repo | |
Framework | |
Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery
Title | Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery |
Authors | Thomas Probst, Kevis-Kokitsi Maninis, Ajad Chhatkuli, Mouloud Ourak, Emmanuel Vander Poorten, Luc Van Gool |
Abstract | Computer vision and robotics are being increasingly applied in medical interventions. Especially in interventions where extreme precision is required they could make a difference. One such application is robot-assisted retinal microsurgery. In recent works, such interventions are conducted under a stereo-microscope, and with a robot-controlled surgical tool. The complementarity of computer vision and robotics has however not yet been fully exploited. In order to improve the robot control we are interested in 3D reconstruction of the anatomy and in automatic tool localization using a stereo microscope. In this paper, we solve this problem for the first time using a single pipeline, starting from uncalibrated cameras to reach metric 3D reconstruction and registration, in retinal microsurgery. The key ingredients of our method are: (a) surgical tool landmark detection, and (b) 3D reconstruction with the stereo microscope, using the detected landmarks. To address the former, we propose a novel deep learning method that detects and recognizes keypoints in high definition images at higher than real-time speed. We use the detected 2D keypoints along with their corresponding 3D coordinates obtained from the robot sensors to calibrate the stereo microscope using an affine projection model. We design an online 3D reconstruction pipeline that makes use of smoothness constraints and performs robot-to-camera registration. The entire pipeline is extensively validated on open-sky porcine eye sequences. Quantitative and qualitative results are presented for all steps. |
Tasks | 3D Reconstruction |
Published | 2017-09-17 |
URL | http://arxiv.org/abs/1709.05665v2 |
http://arxiv.org/pdf/1709.05665v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-tool-landmark-detection-for-stereo |
Repo | |
Framework | |
Deep Learning based Isolated Arabic Scene Character Recognition
Title | Deep Learning based Isolated Arabic Scene Character Recognition |
Authors | Saad Bin Ahmed, Saeeda Naz, Muhammad Imran Razzak, Rubiyah Yousaf |
Abstract | The technological advancement and sophistication in cameras and gadgets prompt researchers to have focus on image analysis and text understanding. The deep learning techniques demonstrated well to assess the potential for classifying text from natural scene images as reported in recent years. There are variety of deep learning approaches that prospects the detection and recognition of text, effectively from images. In this work, we presented Arabic scene text recognition using Convolutional Neural Networks (ConvNets) as a deep learning classifier. As the scene text data is slanted and skewed, thus to deal with maximum variations, we employ five orientations with respect to single occurrence of a character. The training is formulated by keeping filter size 3 x 3 and 5 x 5 with stride value as 1 and 2. During text classification phase, we trained network with distinct learning rates. Our approach reported encouraging results on recognition of Arabic characters from segmented Arabic scene images. |
Tasks | Scene Text Recognition, Text Classification |
Published | 2017-04-22 |
URL | http://arxiv.org/abs/1704.06821v1 |
http://arxiv.org/pdf/1704.06821v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-isolated-arabic-scene |
Repo | |
Framework | |
Evolutionary learning of fire fighting strategies
Title | Evolutionary learning of fire fighting strategies |
Authors | Martin Kretschmer, Elmar Langetepe |
Abstract | The dynamic problem of enclosing an expanding fire can be modelled by a discrete variant in a grid graph. While the fire expands to all neighbouring cells in any time step, the fire fighter is allowed to block $c$ cells in the average outside the fire in the same time interval. It was shown that the success of the fire fighter is guaranteed for $c>1.5$ but no strategy can enclose the fire for $c\leq 1.5$. For achieving such a critical threshold the correctness (sometimes even optimality) of strategies and lower bounds have been shown by integer programming or by direct but often very sophisticated arguments. We investigate the problem whether it is possible to find or to approach such a threshold and/or optimal strategies by means of evolutionary algorithms, i.e., we just try to learn successful strategies for different constants $c$ and have a look at the outcome. The main general idea is that this approach might give some insight in the power of evolutionary strategies for similar geometrically motivated threshold questions. We investigate the variant of protecting a highway with still unknown threshold and found interesting strategic paradigms. Keywords: Dynamic environments, fire fighting, evolutionary strategies, threshold approximation |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01721v1 |
http://arxiv.org/pdf/1705.01721v1.pdf | |
PWC | https://paperswithcode.com/paper/evolutionary-learning-of-fire-fighting |
Repo | |
Framework | |
A Self-Training Method for Semi-Supervised GANs
Title | A Self-Training Method for Semi-Supervised GANs |
Authors | Alan Do-Omri, Dalei Wu, Xiaohua Liu |
Abstract | Since the creation of Generative Adversarial Networks (GANs), much work has been done to improve their training stability, their generated image quality, their range of application but nearly none of them explored their self-training potential. Self-training has been used before the advent of deep learning in order to allow training on limited labelled training data and has shown impressive results in semi-supervised learning. In this work, we combine these two ideas and make GANs self-trainable for semi-supervised learning tasks by exploiting their infinite data generation potential. Results show that using even the simplest form of self-training yields an improvement. We also show results for a more complex self-training scheme that performs at least as well as the basic self-training scheme but with significantly less data augmentation. |
Tasks | Data Augmentation |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10313v1 |
http://arxiv.org/pdf/1710.10313v1.pdf | |
PWC | https://paperswithcode.com/paper/a-self-training-method-for-semi-supervised |
Repo | |
Framework | |