July 30, 2019

3061 words 15 mins read

Paper Group AWR 12

Paper Group AWR 12

Time Series Using Exponential Smoothing Cells. Transfer Learning for OCRopus Model Training on Early Printed Books. Automatic Query Image Disambiguation for Content-Based Image Retrieval. Self-Normalizing Neural Networks. Speaker Diarization with LSTM. In2I : Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks. Effic …

Time Series Using Exponential Smoothing Cells

Title Time Series Using Exponential Smoothing Cells
Authors Avner Abrami, Aleksandr Y. Aravkin, Younghun Kim
Abstract Time series analysis is used to understand and predict dynamic processes, including evolving demands in business, weather, markets, and biological rhythms. Exponential smoothing is used in all these domains to obtain simple interpretable models of time series and to forecast future values. Despite its popularity, exponential smoothing fails dramatically in the presence of outliers, large amounts of noise, or when the underlying time series changes. We propose a flexible model for time series analysis, using exponential smoothing cells for overlapping time windows. The approach can detect and remove outliers, denoise data, fill in missing observations, and provide meaningful forecasts in challenging situations. In contrast to classic exponential smoothing, which solves a nonconvex optimization problem over the smoothing parameters and initial state, the proposed approach requires solving a single structured convex optimization problem. Recent developments in efficient convex optimization of large-scale dynamic models make the approach tractable. We illustrate new capabilities using synthetic examples, and then use the approach to analyze and forecast noisy real-world time series. Code for the approach and experiments is publicly available.
Tasks Time Series, Time Series Analysis
Published 2017-06-09
URL http://arxiv.org/abs/1706.02829v4
PDF http://arxiv.org/pdf/1706.02829v4.pdf
PWC https://paperswithcode.com/paper/time-series-using-exponential-smoothing-cells
Repo https://github.com/UW-AMO/TimeSeriesES-Cell
Framework none

Transfer Learning for OCRopus Model Training on Early Printed Books

Title Transfer Learning for OCRopus Model Training on Early Printed Books
Authors Christian Reul, Christoph Wick, Uwe Springmann, Frank Puppe
Abstract A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by building from already existing models during training instead of starting from scratch. To overcome the discrepancies between the set of characters of the pretrained model and the additional ground truth the OCRopus code is adapted to allow for alphabet expansion or reduction. The character set is now capable of flexibly adding and deleting characters from the pretrained alphabet when an existing model is loaded. For our experiments we use a self-trained mixed model on early Latin prints and the two standard OCRopus models on modern English and German Fraktur texts. The evaluation on seven early printed books showed that training from the Latin mixed model reduces the average amount of errors by 43% and 26%, respectively compared to training from scratch with 60 and 150 lines of ground truth, respectively. Furthermore, it is shown that even building from mixed models trained on data unrelated to the newly added training and test data can lead to significantly improved recognition results.
Tasks Optical Character Recognition, Transfer Learning
Published 2017-12-15
URL http://arxiv.org/abs/1712.05586v2
PDF http://arxiv.org/pdf/1712.05586v2.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-ocropus-model-training
Repo https://github.com/chreul/OCR_Testdata_EarlyPrintedBooks
Framework none

Automatic Query Image Disambiguation for Content-Based Image Retrieval

Title Automatic Query Image Disambiguation for Content-Based Image Retrieval
Authors Björn Barz, Joachim Denzler
Abstract Query images presented to content-based image retrieval systems often have various different interpretations, making it difficult to identify the search objective pursued by the user. We propose a technique for overcoming this ambiguity, while keeping the amount of required user interaction at a minimum. To achieve this, the neighborhood of the query image is divided into coherent clusters from which the user may choose the relevant ones. A novel feedback integration technique is then employed to re-rank the entire database with regard to both the user feedback and the original query. We evaluate our approach on the publicly available MIRFLICKR-25K dataset, where it leads to a relative improvement of average precision by 23% over the baseline retrieval, which does not distinguish between different image senses.
Tasks Content-Based Image Retrieval, Image Retrieval
Published 2017-11-02
URL http://arxiv.org/abs/1711.00953v1
PDF http://arxiv.org/pdf/1711.00953v1.pdf
PWC https://paperswithcode.com/paper/automatic-query-image-disambiguation-for
Repo https://github.com/cvjena/aid
Framework none

Self-Normalizing Neural Networks

Title Self-Normalizing Neural Networks
Authors Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
Abstract Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are “scaled exponential linear units” (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, we prove that activations close to zero mean and unit variance that are propagated through many network layers will converge towards zero mean and unit variance – even under the presence of noise and perturbations. This convergence property of SNNs allows to (1) train deep networks with many layers, (2) employ strong regularization, and (3) to make learning highly robust. Furthermore, for activations not close to unit variance, we prove an upper and lower bound on the variance, thus, vanishing and exploding gradients are impossible. We compared SNNs on (a) 121 tasks from the UCI machine learning repository, on (b) drug discovery benchmarks, and on (c) astronomy tasks with standard FNNs and other machine learning methods such as random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy data set. The winning SNN architectures are often very deep. Implementations are available at: github.com/bioinf-jku/SNNs.
Tasks Drug Discovery, Pulsar Prediction
Published 2017-06-08
URL http://arxiv.org/abs/1706.02515v5
PDF http://arxiv.org/pdf/1706.02515v5.pdf
PWC https://paperswithcode.com/paper/self-normalizing-neural-networks
Repo https://github.com/BenjiKCF/SNN-with-embeddings-for-Malware-Prediction
Framework pytorch

Speaker Diarization with LSTM

Title Speaker Diarization with LSTM
Authors Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno
Abstract For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vector based speaker verification systems to develop a new d-vector based approach to speaker diarization. Specifically, we combine LSTM-based d-vector audio embeddings with recent work in non-parametric clustering to obtain a state-of-the-art speaker diarization system. Our system is evaluated on three standard public datasets, suggesting that d-vector based diarization systems offer significant advantages over traditional i-vector based systems. We achieved a 12.0% diarization error rate on NIST SRE 2000 CALLHOME, while our model is trained with out-of-domain data from voice search logs.
Tasks Speaker Diarization, Speaker Verification
Published 2017-10-28
URL http://arxiv.org/abs/1710.10468v6
PDF http://arxiv.org/pdf/1710.10468v6.pdf
PWC https://paperswithcode.com/paper/speaker-diarization-with-lstm
Repo https://github.com/wq2012/SpectralCluster
Framework none

In2I : Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks

Title In2I : Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks
Authors Pramuditha Perera, Mahdi Abavisani, Vishal M. Patel
Abstract In unsupervised image-to-image translation, the goal is to learn the mapping between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a specified domain. For this purpose, we introduce a Generative Adversarial Network (GAN) based framework along with a multi-modal generator structure and a new loss term, latent consistency loss. Through various experiments we show that leveraging multiple inputs generally improves the visual quality of the translated images. Moreover, we show that the proposed method outperforms current state-of-the-art unsupervised image-to-image translation methods.
Tasks Image-to-Image Translation, Multimodal Unsupervised Image-To-Image Translation, Unsupervised Image-To-Image Translation
Published 2017-11-26
URL http://arxiv.org/abs/1711.09334v1
PDF http://arxiv.org/pdf/1711.09334v1.pdf
PWC https://paperswithcode.com/paper/in2i-unsupervised-multi-image-to-image
Repo https://github.com/PramuPerera/In2I
Framework pytorch

Efficient Parallel Methods for Deep Reinforcement Learning

Title Efficient Parallel Methods for Deep Reinforcement Learning
Authors Alfredo V. Clemente, Humberto N. Castejón, Arjun Chandra
Abstract We propose a novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine. The framework is algorithm agnostic and can be applied to on-policy, off-policy, value based and policy gradient based algorithms. Given its inherent parallelism, the framework can be efficiently implemented on a GPU, allowing the usage of powerful models while significantly reducing training time. We demonstrate the effectiveness of our framework by implementing an advantage actor-critic algorithm on a GPU, using on-policy experiences and employing synchronous updates. Our algorithm achieves state-of-the-art performance on the Atari domain after only a few hours of training. Our framework thus opens the door for much faster experimentation on demanding problem domains. Our implementation is open-source and is made public at https://github.com/alfredvc/paac
Tasks
Published 2017-05-13
URL http://arxiv.org/abs/1705.04862v2
PDF http://arxiv.org/pdf/1705.04862v2.pdf
PWC https://paperswithcode.com/paper/efficient-parallel-methods-for-deep
Repo https://github.com/alfredvc/paac
Framework tf

CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms

Title CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms
Authors Ahmed Elgammal, Bingchen Liu, Mohamed Elhoseiny, Marian Mazzone
Abstract We propose a new system for generating art. The system generates art by looking at art and learning about style; and becomes creative by increasing the arousal potential of the generated art by deviating from the learned styles. We build over Generative Adversarial Networks (GAN), which have shown the ability to learn to generate novel images simulating a given distribution. We argue that such networks are limited in their ability to generate creative products in their original design. We propose modifications to its objective to make it capable of generating creative art by maximizing deviation from established styles and minimizing deviation from art distribution. We conducted experiments to compare the response of human subjects to the generated art with their response to art created by artists. The results show that human subjects could not distinguish art generated by the proposed system from art generated by contemporary artists and shown in top art fairs. Human subjects even rated the generated images higher on various scales.
Tasks
Published 2017-06-21
URL http://arxiv.org/abs/1706.07068v1
PDF http://arxiv.org/pdf/1706.07068v1.pdf
PWC https://paperswithcode.com/paper/can-creative-adversarial-networks-generating
Repo https://github.com/AndreasWieg/Creative-GAN
Framework tf

Converting Your Thoughts to Texts: Enabling Brain Typing via Deep Feature Learning of EEG Signals

Title Converting Your Thoughts to Texts: Enabling Brain Typing via Deep Feature Learning of EEG Signals
Authors Xiang Zhang, Lina Yao, Quan Z. Sheng, Salil S. Kanhere, Tao Gu, Dalin Zhang
Abstract An electroencephalography (EEG) based Brain Computer Interface (BCI) enables people to communicate with the outside world by interpreting the EEG signals of their brains to interact with devices such as wheelchairs and intelligent robots. More specifically, motor imagery EEG (MI-EEG), which reflects a subjects active intent, is attracting increasing attention for a variety of BCI applications. Accurate classification of MI-EEG signals while essential for effective operation of BCI systems, is challenging due to the significant noise inherent in the signals and the lack of informative correlation between the signals and brain activities. In this paper, we propose a novel deep neural network based learning framework that affords perceptive insights into the relationship between the MI-EEG data and brain activities. We design a joint convolutional recurrent neural network that simultaneously learns robust high-level feature presentations through low-dimensional dense embeddings from raw MI-EEG signals. We also employ an Autoencoder layer to eliminate various artifacts such as background activities. The proposed approach has been evaluated extensively on a large- scale public MI-EEG dataset and a limited but easy-to-deploy dataset collected in our lab. The results show that our approach outperforms a series of baselines and the competitive state-of-the- art methods, yielding a classification accuracy of 95.53%. The applicability of our proposed approach is further demonstrated with a practical BCI system for typing.
Tasks EEG
Published 2017-09-26
URL http://arxiv.org/abs/1709.08820v1
PDF http://arxiv.org/pdf/1709.08820v1.pdf
PWC https://paperswithcode.com/paper/converting-your-thoughts-to-texts-enabling
Repo https://github.com/xiangzhang1015/Brain_typing
Framework tf

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Title R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
Authors Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, Zhenbo Luo
Abstract In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.
Tasks Scene Text Detection
Published 2017-06-29
URL http://arxiv.org/abs/1706.09579v2
PDF http://arxiv.org/pdf/1706.09579v2.pdf
PWC https://paperswithcode.com/paper/r2cnn-rotational-region-cnn-for-orientation
Repo https://github.com/dafanghe/Tensorflow_SceneText_Oriented_Box_Predictor
Framework tf

Word forms - not just their lengths- are optimized for efficient communication

Title Word forms - not just their lengths- are optimized for efficient communication
Authors Stephan C. Meylan, Thomas L. Griffiths
Abstract The inverse relationship between the length of a word and the frequency of its use, first identified by G.K. Zipf in 1935, is a classic empirical law that holds across a wide range of human languages. We demonstrate that length is one aspect of a much more general property of words: how distinctive they are with respect to other words in a language. Distinctiveness plays a critical role in recognizing words in fluent speech, in that it reflects the strength of potential competitors when selecting the best candidate for an ambiguous signal. Phonological information content, a measure of a word’s string probability under a statistical model of a language’s sound or character sequences, concisely captures distinctiveness. Examining large-scale corpora from 13 languages, we find that distinctiveness significantly outperforms word length as a predictor of frequency. This finding provides evidence that listeners’ processing constraints shape fine-grained aspects of word forms across languages.
Tasks
Published 2017-03-06
URL http://arxiv.org/abs/1703.01694v2
PDF http://arxiv.org/pdf/1703.01694v2.pdf
PWC https://paperswithcode.com/paper/word-forms-not-just-their-lengths-are
Repo https://github.com/smeylan/pic-analysis
Framework none

Pose Guided Person Image Generation

Title Pose Guided Person Image Generation
Authors Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, Luc Van Gool
Abstract This paper proposes the novel Pose Guided Person Generation Network (PG$^2$) that allows to synthesize person images in arbitrary poses, based on an image of that person and a novel pose. Our generation framework PG$^2$ utilizes the pose information explicitly and consists of two key stages: pose integration and image refinement. In the first stage the condition image and the target pose are fed into a U-Net-like network to generate an initial but coarse image of the person with the target pose. The second stage then refines the initial and blurry result by training a U-Net-like generator in an adversarial way. Extensive experimental results on both 128$\times$64 re-identification images and 256$\times$256 fashion photos show that our model generates high-quality person images with convincing details.
Tasks Gesture-to-Gesture Translation, Image Generation, Pose Transfer
Published 2017-05-25
URL http://arxiv.org/abs/1705.09368v6
PDF http://arxiv.org/pdf/1705.09368v6.pdf
PWC https://paperswithcode.com/paper/pose-guided-person-image-generation
Repo https://github.com/chuanqichen/deepcoaching
Framework pytorch

Deformable GANs for Pose-based Human Image Generation

Title Deformable GANs for Pose-based Human Image Generation
Authors Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuiliere, Nicu Sebe
Abstract In this paper we address the problem of generating person images conditioned on a given pose. Specifically, given an image of a person and a target pose, we synthesize a new image of that person in the novel pose. In order to deal with pixel-to-pixel misalignments caused by the pose differences, we introduce deformable skip connections in the generator of our Generative Adversarial Network. Moreover, a nearest-neighbour loss is proposed instead of the common L1 and L2 losses in order to match the details of the generated image with the target image. We test our approach using photos of persons in different poses and we compare our method with previous work in this area showing state-of-the-art results in two benchmarks. Our method can be applied to the wider field of deformable object generation, provided that the pose of the articulated object can be extracted using a keypoint detector.
Tasks Gesture-to-Gesture Translation, Image Generation, Image-to-Image Translation, Pose Transfer
Published 2017-12-29
URL http://arxiv.org/abs/1801.00055v2
PDF http://arxiv.org/pdf/1801.00055v2.pdf
PWC https://paperswithcode.com/paper/deformable-gans-for-pose-based-human-image
Repo https://github.com/AliaksandrSiarohin/pose-gan
Framework tf

Probability Series Expansion Classifier that is Interpretable by Design

Title Probability Series Expansion Classifier that is Interpretable by Design
Authors Sapan Agarwal, Corey M. Hudson
Abstract This work presents a new classifier that is specifically designed to be fully interpretable. This technique determines the probability of a class outcome, based directly on probability assignments measured from the training data. The accuracy of the predicted probability can be improved by measuring more probability estimates from the training data to create a series expansion that refines the predicted probability. We use this work to classify four standard datasets and achieve accuracies comparable to that of Random Forests. Because this technique is interpretable by design, it is capable of determining the combinations of features that contribute to a particular classification probability for individual cases as well as the weightings of each of combination of features.
Tasks
Published 2017-10-27
URL http://arxiv.org/abs/1710.10301v1
PDF http://arxiv.org/pdf/1710.10301v1.pdf
PWC https://paperswithcode.com/paper/probability-series-expansion-classifier-that
Repo https://github.com/sandialabs/aweml
Framework none

Information-gain computation

Title Information-gain computation
Authors Anthony Di Franco
Abstract Despite large incentives, ecorrectness in software remains an elusive goal. Declarative programming techniques, where algorithms are derived from a specification of the desired behavior, offer hope to address this problem, since there is a combinatorial reduction in complexity in programming in terms of specifications instead of algorithms, and arbitrary desired properties can be expressed and enforced in specifications directly. However, limitations on performance have prevented programming with declarative specifications from becoming a mainstream technique for general-purpose programming. To address the performance bottleneck in deriving an algorithm from a specification, I propose information-gain computation, a framework where an adaptive evaluation strategy is used to efficiently perform a search which derives algorithms that provide information about a query most directly. Within this framework, opportunities to compress the search space present themselves, which suggest that information-theoretic bounds on the performance of such a system might be articulated and a system designed to achieve them. In a preliminary empirical study of adaptive evaluation for a simple test program, the evaluation strategy adapts successfully to evaluate a query efficiently.
Tasks
Published 2017-07-05
URL http://arxiv.org/abs/1707.01550v3
PDF http://arxiv.org/pdf/1707.01550v3.pdf
PWC https://paperswithcode.com/paper/information-gain-computation
Repo https://github.com/difranco/fifth
Framework none
comments powered by Disqus