July 30, 2019

2958 words 14 mins read

Paper Group AWR 70

Paper Group AWR 70

Improved training for online end-to-end speech recognition systems. Consensus measure of rankings. ShaResNet: reducing residual network parameter number by sharing weights. Learning the Enigma with Recurrent Neural Networks. STARDATA: A StarCraft AI Research Dataset. A Tutorial on Thompson Sampling. Towards Metamerism via Foveated Style Transfer. C …

Improved training for online end-to-end speech recognition systems

Title Improved training for online end-to-end speech recognition systems
Authors Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao
Abstract Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training. Otherwise, the networks may fail to find a good local optimum. This is particularly true for online networks, such as unidirectional LSTMs. Currently, the best strategy to train such systems is to bootstrap the training from a tied-triphone system. However, this is time consuming, and more importantly, is impossible for languages without a high-quality pronunciation lexicon. In this work, we propose an initialization strategy that uses teacher-student learning to transfer knowledge from a large, well-trained, offline end-to-end speech recognition model to an online end-to-end model, eliminating the need for a lexicon or any other linguistic resources. We also explore curriculum learning and label smoothing and show how they can be combined with the proposed teacher-student learning for further improvements. We evaluate our methods on a Microsoft Cortana personal assistant task and show that the proposed method results in a 19 % relative improvement in word error rate compared to a randomly-initialized baseline system.
Tasks End-To-End Speech Recognition, Speech Recognition
Published 2017-11-06
URL http://arxiv.org/abs/1711.02212v2
PDF http://arxiv.org/pdf/1711.02212v2.pdf
PWC https://paperswithcode.com/paper/improved-training-for-online-end-to-end
Repo https://github.com/vadimkantorov/ctc
Framework pytorch

Consensus measure of rankings

Title Consensus measure of rankings
Authors Zhiwei Lin, Yi Li, Xiaolian Guo
Abstract A ranking is an ordered sequence of items, in which an item with higher ranking score is more preferred than the items with lower ranking scores. In many information systems, rankings are widely used to represent the preferences over a set of items or candidates. The consensus measure of rankings is the problem of how to evaluate the degree to which the rankings agree. The consensus measure can be used to evaluate rankings in many information systems, as quite often there is not ground truth available for evaluation. This paper introduces a novel approach for consensus measure of rankings by using graph representation, in which the vertices or nodes are the items and the edges are the relationship of items in the rankings. Such representation leads to various algorithms for consensus measure in terms of different aspects of rankings, including the number of common patterns, the number of common patterns with fixed length and the length of the longest common patterns. The proposed measure can be adopted for various types of rankings, such as full rankings, partial rankings and rankings with ties. This paper demonstrates how the proposed approaches can be used to evaluate the quality of rank aggregation and the quality of top-$k$ rankings from Google and Bing search engines.
Tasks
Published 2017-04-27
URL http://arxiv.org/abs/1704.08464v2
PDF http://arxiv.org/pdf/1704.08464v2.pdf
PWC https://paperswithcode.com/paper/consensus-measure-of-rankings
Repo https://github.com/zhiweiuu/secs
Framework none

ShaResNet: reducing residual network parameter number by sharing weights

Title ShaResNet: reducing residual network parameter number by sharing weights
Authors Alexandre Boulch
Abstract Deep Residual Networks have reached the state of the art in many image processing tasks such image classification. However, the cost for a gain in accuracy in terms of depth and memory is prohibitive as it requires a higher number of residual blocks, up to double the initial value. To tackle this problem, we propose in this paper a way to reduce the redundant information of the networks. We share the weights of convolutional layers between residual blocks operating at the same spatial scale. The signal flows multiple times in the same convolutional layer. The resulting architecture, called ShaResNet, contains block specific layers and shared layers. These ShaResNet are trained exactly in the same fashion as the commonly used residual networks. We show, on the one hand, that they are almost as efficient as their sequential counterparts while involving less parameters, and on the other hand that they are more efficient than a residual network with the same number of parameters. For example, a 152-layer-deep residual network can be reduced to 106 convolutional layers, i.e. a parameter gain of 39%, while loosing less than 0.2% accuracy on ImageNet.
Tasks Image Classification
Published 2017-02-28
URL http://arxiv.org/abs/1702.08782v2
PDF http://arxiv.org/pdf/1702.08782v2.pdf
PWC https://paperswithcode.com/paper/sharesnet-reducing-residual-network-parameter
Repo https://github.com/osmr/imgclsmob
Framework mxnet

Learning the Enigma with Recurrent Neural Networks

Title Learning the Enigma with Recurrent Neural Networks
Authors Sam Greydanus
Abstract Recurrent neural networks (RNNs) represent the state of the art in translation, image captioning, and speech recognition. They are also capable of learning algorithmic tasks such as long addition, copying, and sorting from a set of training examples. We demonstrate that RNNs can learn decryption algorithms – the mappings from plaintext to ciphertext – for three polyalphabetic ciphers (Vigen`ere, Autokey, and Enigma). Most notably, we demonstrate that an RNN with a 3000-unit Long Short-Term Memory (LSTM) cell can learn the decryption function of the Enigma machine. We argue that our model learns efficient internal representations of these ciphers 1) by exploring activations of individual memory neurons and 2) by comparing memory usage across the three ciphers. To be clear, our work is not aimed at ‘cracking’ the Enigma cipher. However, we do show that our model can perform elementary cryptanalysis by running known-plaintext attacks on the Vigen`ere and Autokey ciphers. Our results indicate that RNNs can learn algorithmic representations of black box polyalphabetic ciphers and that these representations are useful for cryptanalysis.
Tasks Cryptanalysis
Published 2017-08-24
URL http://arxiv.org/abs/1708.07576v2
PDF http://arxiv.org/pdf/1708.07576v2.pdf
PWC https://paperswithcode.com/paper/learning-the-enigma-with-recurrent-neural
Repo https://github.com/greydanus/crypto-rnn
Framework tf

STARDATA: A StarCraft AI Research Dataset

Title STARDATA: A StarCraft AI Research Dataset
Authors Zeming Lin, Jonas Gehring, Vasil Khalidov, Gabriel Synnaeve
Abstract We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imitation learning, forward modeling, partial information extraction, and others. We use TorchCraft to extract and store the data, which standardizes the data format for both reading from replays and reading directly from the game. Furthermore, the data can be used on different operating systems and platforms. The dataset contains valid, non-corrupted replays only and its quality and diversity was ensured by a number of heuristics. We illustrate the diversity of the data with various statistics and provide examples of tasks that benefit from the dataset. We make the dataset available at https://github.com/TorchCraft/StarData . En Taro Adun!
Tasks Imitation Learning, Real-Time Strategy Games, Starcraft
Published 2017-08-07
URL http://arxiv.org/abs/1708.02139v1
PDF http://arxiv.org/pdf/1708.02139v1.pdf
PWC https://paperswithcode.com/paper/stardata-a-starcraft-ai-research-dataset
Repo https://github.com/TorchCraft/StarData
Framework none

A Tutorial on Thompson Sampling

Title A Tutorial on Thompson Sampling
Authors Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
Abstract Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, dynamic pricing, recommendation, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.
Tasks Active Learning
Published 2017-07-07
URL http://arxiv.org/abs/1707.02038v2
PDF http://arxiv.org/pdf/1707.02038v2.pdf
PWC https://paperswithcode.com/paper/a-tutorial-on-thompson-sampling
Repo https://github.com/BBloggsbott/k-armed-bandits
Framework none

Towards Metamerism via Foveated Style Transfer

Title Towards Metamerism via Foveated Style Transfer
Authors Arturo Deza, Aditya Jonnalagadda, Miguel Eckstein
Abstract The problem of $\textit{visual metamerism}$ is defined as finding a family of perceptually indistinguishable, yet physically different images. In this paper, we propose our NeuroFovea metamer model, a foveated generative model that is based on a mixture of peripheral representations and style transfer forward-pass algorithms. Our gradient-descent free model is parametrized by a foveated VGG19 encoder-decoder which allows us to encode images in high dimensional space and interpolate between the content and texture information with adaptive instance normalization anywhere in the visual field. Our contributions include: 1) A framework for computing metamers that resembles a noisy communication system via a foveated feed-forward encoder-decoder network – We observe that metamerism arises as a byproduct of noisy perturbations that partially lie in the perceptual null space; 2) A perceptual optimization scheme as a solution to the hyperparametric nature of our metamer model that requires tuning of the image-texture tradeoff coefficients everywhere in the visual field which are a consequence of internal noise; 3) An ABX psychophysical evaluation of our metamers where we also find that the rate of growth of the receptive fields in our model match V1 for reference metamers and V2 between synthesized samples. Our model also renders metamers at roughly a second, presenting a $\times1000$ speed-up compared to the previous work, which allows for tractable data-driven metamer experiments.
Tasks Metamerism, Style Transfer, Texture Synthesis
Published 2017-05-29
URL http://arxiv.org/abs/1705.10041v3
PDF http://arxiv.org/pdf/1705.10041v3.pdf
PWC https://paperswithcode.com/paper/towards-metamerism-via-foveated-style
Repo https://github.com/ArturoDeza/NeuroFovea
Framework pytorch

CondenseNet: An Efficient DenseNet using Learned Group Convolutions

Title CondenseNet: An Efficient DenseNet using Learned Group Convolutions
Authors Gao Huang, Shichen Liu, Laurens van der Maaten, Kilian Q. Weinberger
Abstract Deep neural networks are increasingly used on mobile devices, where computational resources are limited. In this paper we develop CondenseNet, a novel network architecture with unprecedented efficiency. It combines dense connectivity with a novel module called learned group convolution. The dense connectivity facilitates feature re-use in the network, whereas learned group convolutions remove connections between layers for which this feature re-use is superfluous. At test time, our model can be implemented using standard group convolutions, allowing for efficient computation in practice. Our experiments show that CondenseNets are far more efficient than state-of-the-art compact convolutional networks such as MobileNets and ShuffleNets.
Tasks
Published 2017-11-25
URL http://arxiv.org/abs/1711.09224v2
PDF http://arxiv.org/pdf/1711.09224v2.pdf
PWC https://paperswithcode.com/paper/condensenet-an-efficient-densenet-using
Repo https://github.com/vponcelo/CondenseNet
Framework pytorch

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

Title Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Authors Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller
Abstract We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. We scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, we identify common error modes of attention-based speech synthesis networks, demonstrate how to mitigate them, and compare several different waveform synthesis methods. We also describe how to scale inference to ten million queries per day on one single-GPU server.
Tasks Speech Synthesis
Published 2017-10-20
URL http://arxiv.org/abs/1710.07654v3
PDF http://arxiv.org/pdf/1710.07654v3.pdf
PWC https://paperswithcode.com/paper/deep-voice-3-scaling-text-to-speech-with
Repo https://github.com/r9y9/deepvoice3_pytorch
Framework pytorch

Fixing a Broken ELBO

Title Fixing a Broken ELBO
Authors Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy
Abstract Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.
Tasks Latent Variable Models, Representation Learning, Unsupervised Representation Learning
Published 2017-11-01
URL http://arxiv.org/abs/1711.00464v3
PDF http://arxiv.org/pdf/1711.00464v3.pdf
PWC https://paperswithcode.com/paper/fixing-a-broken-elbo
Repo https://github.com/suvalaki/Deeper
Framework tf

Think Globally, Embed Locally — Locally Linear Meta-embedding of Words

Title Think Globally, Embed Locally — Locally Linear Meta-embedding of Words
Authors Danushka Bollegala, Kohei Hayashi, Ken-ichi Kawarabayashi
Abstract Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accurate and complete \emph{meta-embeddings} of words. For this purpose, we propose an unsupervised locally linear meta-embedding learning method that takes pre-trained word embeddings as the input, and produces more accurate meta embeddings. Unlike previously proposed meta-embedding learning methods that learn a global projection over all words in a vocabulary, our proposed method is sensitive to the differences in local neighbourhoods of the individual source word embeddings. Moreover, we show that vector concatenation, a previously proposed highly competitive baseline approach for integrating word embeddings, can be derived as a special case of the proposed method. Experimental results on semantic similarity, word analogy, relation classification, and short-text classification tasks show that our meta-embeddings to significantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings.
Tasks Relation Classification, Semantic Similarity, Semantic Textual Similarity, Text Classification, Word Embeddings
Published 2017-09-19
URL http://arxiv.org/abs/1709.06671v1
PDF http://arxiv.org/pdf/1709.06671v1.pdf
PWC https://paperswithcode.com/paper/think-globally-embed-locally-locally-linear
Repo https://github.com/Shujian2015/meta-embedding-paper-list
Framework none

Systematic study of color spaces and components for the segmentation of sky/cloud images

Title Systematic study of color spaces and components for the segmentation of sky/cloud images
Authors Soumyabrata Dev, Yee Hui Lee, Stefan Winkler
Abstract Sky/cloud imaging using ground-based Whole Sky Imagers (WSI) is a cost-effective means to understanding cloud cover and weather patterns. The accurate segmentation of clouds in these images is a challenging task, as clouds do not possess any clear structure. Several algorithms using different color models have been proposed in the literature. This paper presents a systematic approach for the selection of color spaces and components for optimal segmentation of sky/cloud images. Using mainly principal component analysis (PCA) and fuzzy clustering for evaluation, we identify the most suitable color components for this task.
Tasks
Published 2017-01-17
URL http://arxiv.org/abs/1701.04520v1
PDF http://arxiv.org/pdf/1701.04520v1.pdf
PWC https://paperswithcode.com/paper/systematic-study-of-color-spaces-and
Repo https://github.com/Soumyabrata/color-channels
Framework none

A Comprehensive Implementation of Conceptual Spaces

Title A Comprehensive Implementation of Conceptual Spaces
Authors Lucas Bechberger, Kai-Uwe Kühnberger
Abstract The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points and concepts are represented by regions in a (potentially) high-dimensional space. Based on our recent formalization, we present a comprehensive implementation of the conceptual spaces framework that is not only capable of representing concepts with inter-domain correlations, but that also offers a variety of operations on these concepts.
Tasks
Published 2017-07-14
URL http://arxiv.org/abs/1707.05165v3
PDF http://arxiv.org/pdf/1707.05165v3.pdf
PWC https://paperswithcode.com/paper/a-comprehensive-implementation-of-conceptual
Repo https://github.com/lbechberger/ConceptualSpaces
Framework none

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Title A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
Authors Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta
Abstract How do we learn an object detector that is invariant to occlusions and deformations? Our current solution is to use a data-driven strategy – collect large-scale datasets which have object instances under different conditions. The hope is that the final classifier can use these examples to learn invariances. But is it really possible to see all the occlusions in a dataset? We argue that like categories, occlusions and object deformations also follow a long-tail. Some occlusions and deformations are so rare that they hardly happen; yet we want to learn a model invariant to such occurrences. In this paper, we propose an alternative solution. We propose to learn an adversarial network that generates examples with occlusions and deformations. The goal of the adversary is to generate examples that are difficult for the object detector to classify. In our framework both the original detector and adversary are learned in a joint manner. Our experimental results indicate a 2.3% mAP boost on VOC07 and a 2.6% mAP boost on VOC2012 object detection challenge compared to the Fast-RCNN pipeline. We also release the code for this paper.
Tasks Object Detection
Published 2017-04-11
URL http://arxiv.org/abs/1704.03414v1
PDF http://arxiv.org/pdf/1704.03414v1.pdf
PWC https://paperswithcode.com/paper/a-fast-rcnn-hard-positive-generation-via
Repo https://github.com/xzabg/fast-adversarial
Framework torch

Vietnamese Semantic Role Labelling

Title Vietnamese Semantic Role Labelling
Authors Phuong Le-Hong, Thai Hoang Pham, Xuan Khoai Pham, Thi Minh Huyen Nguyen, Thi Luong Nguyen, Minh Hiep Nguyen
Abstract In this paper, we study semantic role labelling (SRL), a subtask of semantic parsing of natural language sentences and its application for the Vietnamese language. We present our effort in building Vietnamese PropBank, the first Vietnamese SRL corpus and a software system for labelling semantic roles of Vietnamese texts. In particular, we present a novel constituent extraction algorithm in the argument candidate identification step which is more suitable and more accurate than the common node-mapping method. In the machine learning part, our system integrates distributed word features produced by two recent unsupervised learning models in two learned statistical classifiers and makes use of integer linear programming inference procedure to improve the accuracy. The system is evaluated in a series of experiments and achieves a good result, an $F_1$ score of 74.77%. Our system, including corpus and software, is available as an open source project for free research and we believe that it is a good baseline for the development of future Vietnamese SRL systems.
Tasks Semantic Parsing
Published 2017-11-28
URL http://arxiv.org/abs/1711.10124v1
PDF http://arxiv.org/pdf/1711.10124v1.pdf
PWC https://paperswithcode.com/paper/vietnamese-semantic-role-labelling
Repo https://github.com/pth1993/vnSRL
Framework none
comments powered by Disqus