Paper Group ANR 441
One-Shot Generalization in Deep Generative Models. GMM-Free Flat Start Sequence-Discriminative DNN Training. Active exploration in parameterized reinforcement learning. AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech. Distributed Multi-Task Relationship Learning. Video Propagation Networks. Tracking Amendments to Legislation and …
One-Shot Generalization in Deep Generative Models
Title | One-Shot Generalization in Deep Generative Models |
Authors | Danilo Jimenez Rezende, Shakir Mohamed, Ivo Danihelka, Karol Gregor, Daan Wierstra |
Abstract | Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples—having seen new examples just once—providing an important class of general-purpose models for one-shot machine learning. |
Tasks | Density Estimation, Image Generation |
Published | 2016-03-16 |
URL | http://arxiv.org/abs/1603.05106v2 |
http://arxiv.org/pdf/1603.05106v2.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-generalization-in-deep-generative |
Repo | |
Framework | |
GMM-Free Flat Start Sequence-Discriminative DNN Training
Title | GMM-Free Flat Start Sequence-Discriminative DNN Training |
Authors | Gábor Gosztolya, Tamás Grósz, László Tóth |
Abstract | Recently, attempts have been made to remove Gaussian mixture models (GMM) from the training process of deep neural network-based hidden Markov models (HMM/DNN). For the GMM-free training of a HMM/DNN hybrid we have to solve two problems, namely the initial alignment of the frame-level state labels and the creation of context-dependent states. Although flat-start training via iteratively realigning and retraining the DNN using a frame-level error function is viable, it is quite cumbersome. Here, we propose to use a sequence-discriminative training criterion for flat start. While sequence-discriminative training is routinely applied only in the final phase of model training, we show that with proper caution it is also suitable for getting an alignment of context-independent DNN models. For the construction of tied states we apply a recently proposed KL-divergence-based state clustering method, hence our whole training process is GMM-free. In the experimental evaluation we found that the sequence-discriminative flat start training method is not only significantly faster than the straightforward approach of iterative retraining and realignment, but the word error rates attained are slightly better as well. |
Tasks | |
Published | 2016-10-11 |
URL | http://arxiv.org/abs/1610.03256v1 |
http://arxiv.org/pdf/1610.03256v1.pdf | |
PWC | https://paperswithcode.com/paper/gmm-free-flat-start-sequence-discriminative |
Repo | |
Framework | |
Active exploration in parameterized reinforcement learning
Title | Active exploration in parameterized reinforcement learning |
Authors | Mehdi Khamassi, Costas Tzafestas |
Abstract | Online model-free reinforcement learning (RL) methods with continuous actions are playing a prominent role when dealing with real-world applications such as Robotics. However, when confronted to non-stationary environments, these methods crucially rely on an exploration-exploitation trade-off which is rarely dynamically and automatically adjusted to changes in the environment. Here we propose an active exploration algorithm for RL in structured (parameterized) continuous action space. This framework deals with a set of discrete actions, each of which is parameterized with continuous variables. Discrete exploration is controlled through a Boltzmann softmax function with an inverse temperature $\beta$ parameter. In parallel, a Gaussian exploration is applied to the continuous action parameters. We apply a meta-learning algorithm based on the comparison between variations of short-term and long-term reward running averages to simultaneously tune $\beta$ and the width of the Gaussian distribution from which continuous action parameters are drawn. When applied to a simple virtual human-robot interaction task, we show that this algorithm outperforms continuous parameterized RL both without active exploration and with active exploration based on uncertainty variations measured by a Kalman-Q-learning algorithm. |
Tasks | Meta-Learning, Q-Learning |
Published | 2016-10-06 |
URL | http://arxiv.org/abs/1610.01986v1 |
http://arxiv.org/pdf/1610.01986v1.pdf | |
PWC | https://paperswithcode.com/paper/active-exploration-in-parameterized |
Repo | |
Framework | |
AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech
Title | AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech |
Authors | Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley |
Abstract | Developers of text-to-speech synthesizers (TTS) often make use of human raters to assess the quality of synthesized speech. We demonstrate that we can model human raters’ mean opinion scores (MOS) of synthesized speech using a deep recurrent neural network whose inputs consist solely of a raw waveform. Our best models provide utterance-level estimates of MOS only moderately inferior to sampled human ratings, as shown by Pearson and Spearman correlations. When multiple utterances are scored and averaged, a scenario common in synthesizer quality assessment, AutoMOS achieves correlations approaching those of human raters. The AutoMOS model has a number of applications, such as the ability to explore the parameter space of a speech synthesizer without requiring a human-in-the-loop. |
Tasks | |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09207v1 |
http://arxiv.org/pdf/1611.09207v1.pdf | |
PWC | https://paperswithcode.com/paper/automos-learning-a-non-intrusive-assessor-of |
Repo | |
Framework | |
Distributed Multi-Task Relationship Learning
Title | Distributed Multi-Task Relationship Learning |
Authors | Sulin Liu, Sinno Jialin Pan, Qirong Ho |
Abstract | Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to improve the generalization performance for each task. Traditionally, to perform multi-task learning, one needs to centralize data from all the tasks to a single machine. However, in many real-world applications, data of different tasks may be geo-distributed over different local machines. Due to heavy communication caused by transmitting the data and the issue of data privacy and security, it is impossible to send data of different task to a master machine to perform multi-task learning. Therefore, in this paper, we propose a distributed multi-task learning framework that simultaneously learns predictive models for each task as well as task relationships between tasks alternatingly in the parameter server paradigm. In our framework, we first offer a general dual form for a family of regularized multi-task relationship learning methods. Subsequently, we propose a communication-efficient primal-dual distributed optimization algorithm to solve the dual problem by carefully designing local subproblems to make the dual problem decomposable. Moreover, we provide a theoretical convergence analysis for the proposed algorithm, which is specific for distributed multi-task relationship learning. We conduct extensive experiments on both synthetic and real-world datasets to evaluate our proposed framework in terms of effectiveness and convergence. |
Tasks | Distributed Optimization, Multi-Task Learning |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04022v3 |
http://arxiv.org/pdf/1612.04022v3.pdf | |
PWC | https://paperswithcode.com/paper/distributed-multi-task-relationship-learning |
Repo | |
Framework | |
Video Propagation Networks
Title | Video Propagation Networks |
Authors | Varun Jampani, Raghudeep Gadde, Peter V. Gehler |
Abstract | We propose a technique that propagates information forward through video data. The method is conceptually simple and can be applied to tasks that require the propagation of structured information, such as semantic labels, based on video content. We propose a ‘Video Propagation Network’ that processes video frames in an adaptive manner. The model is applied online: it propagates information forward without the need to access future frames. In particular we combine two components, a temporal bilateral network for dense and video adaptive filtering, followed by a spatial network to refine features and increased flexibility. We present experiments on video object segmentation and semantic video segmentation and show increased performance comparing to the best previous task-specific methods, while having favorable runtime. Additionally we demonstrate our approach on an example regression task of color propagation in a grayscale video. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation, Visual Object Tracking |
Published | 2016-12-16 |
URL | http://arxiv.org/abs/1612.05478v3 |
http://arxiv.org/pdf/1612.05478v3.pdf | |
PWC | https://paperswithcode.com/paper/video-propagation-networks |
Repo | |
Framework | |
Tracking Amendments to Legislation and Other Political Texts with a Novel Minimum-Edit-Distance Algorithm: DocuToads
Title | Tracking Amendments to Legislation and Other Political Texts with a Novel Minimum-Edit-Distance Algorithm: DocuToads |
Authors | Henrik Hermansson, James P. Cross |
Abstract | Political scientists often find themselves tracking amendments to political texts. As different actors weigh in, texts change as they are drafted and redrafted, reflecting political preferences and power. This study provides a novel solution to the prob- lem of detecting amendments to political text based upon minimum edit distances. We demonstrate the usefulness of two language-insensitive, transparent, and efficient minimum-edit-distance algorithms suited for the task. These algorithms are capable of providing an account of the types (insertions, deletions, substitutions, and trans- positions) and substantive amount of amendments made between version of texts. To illustrate the usefulness and efficiency of the approach we replicate two existing stud- ies from the field of legislative studies. Our results demonstrate that minimum edit distance methods can produce superior measures of text amendments to hand-coded efforts in a fraction of the time and resource costs. |
Tasks | |
Published | 2016-08-23 |
URL | http://arxiv.org/abs/1608.06459v1 |
http://arxiv.org/pdf/1608.06459v1.pdf | |
PWC | https://paperswithcode.com/paper/tracking-amendments-to-legislation-and-other |
Repo | |
Framework | |
Knowledge Enhanced Hybrid Neural Network for Text Matching
Title | Knowledge Enhanced Hybrid Neural Network for Text Matching |
Authors | Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou |
Abstract | Long text brings a big challenge to semantic matching due to their complicated semantic and syntactic structures. To tackle the challenge, we consider using prior knowledge to help identify useful information and filter out noise to matching in long text. To this end, we propose a knowledge enhanced hybrid neural network (KEHNN). The model fuses prior knowledge into word representations by knowledge gates and establishes three matching channels with words, sequential structures of sentences given by Gated Recurrent Units (GRU), and knowledge enhanced representations. The three channels are processed by a convolutional neural network to generate high level features for matching, and the features are synthesized as a matching score by a multilayer perceptron. The model extends the existing methods by conducting matching on words, local structures of sentences, and global context of sentences. Evaluation results from extensive experiments on public data sets for question answering and conversation show that KEHNN can significantly outperform the-state-of-the-art matching models and particularly improve the performance on pairs with long text. |
Tasks | Question Answering, Text Matching |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.04684v1 |
http://arxiv.org/pdf/1611.04684v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-enhanced-hybrid-neural-network-for |
Repo | |
Framework | |
Dual Attention Networks for Multimodal Reasoning and Matching
Title | Dual Attention Networks for Multimodal Reasoning and Matching |
Authors | Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim |
Abstract | We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language. DANs attend to specific regions in images and words in text through multiple steps and gather essential information from both modalities. Based on this framework, we introduce two types of DANs for multimodal reasoning and matching, respectively. The reasoning model allows visual and textual attentions to steer each other during collaborative inference, which is useful for tasks such as Visual Question Answering (VQA). In addition, the matching model exploits the two attention mechanisms to estimate the similarity between images and sentences by focusing on their shared semantics. Our extensive experiments validate the effectiveness of DANs in combining vision and language, achieving the state-of-the-art performance on public benchmarks for VQA and image-text matching. |
Tasks | Question Answering, Text Matching, Visual Question Answering |
Published | 2016-11-02 |
URL | http://arxiv.org/abs/1611.00471v2 |
http://arxiv.org/pdf/1611.00471v2.pdf | |
PWC | https://paperswithcode.com/paper/dual-attention-networks-for-multimodal |
Repo | |
Framework | |
Separating Sets of Strings by Finding Matching Patterns is Almost Always Hard
Title | Separating Sets of Strings by Finding Matching Patterns is Almost Always Hard |
Authors | Giuseppe Lancia, Luke Mathieson, Pablo Moscato |
Abstract | We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]-hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others. |
Tasks | |
Published | 2016-04-12 |
URL | http://arxiv.org/abs/1604.03243v3 |
http://arxiv.org/pdf/1604.03243v3.pdf | |
PWC | https://paperswithcode.com/paper/separating-sets-of-strings-by-finding |
Repo | |
Framework | |
Bagged Boosted Trees for Classification of Ecological Momentary Assessment Data
Title | Bagged Boosted Trees for Classification of Ecological Momentary Assessment Data |
Authors | Gerasimos Spanakis, Gerhard Weiss, Anne Roefs |
Abstract | Ecological Momentary Assessment (EMA) data is organized in multiple levels (per-subject, per-day, etc.) and this particular structure should be taken into account in machine learning algorithms used in EMA like decision trees and its variants. We propose a new algorithm called BBT (standing for Bagged Boosted Trees) that is enhanced by a over/under sampling method and can provide better estimates for the conditional class probability function. Experimental results on a real-world dataset show that BBT can benefit EMA data classification and performance. |
Tasks | |
Published | 2016-07-06 |
URL | http://arxiv.org/abs/1607.01582v1 |
http://arxiv.org/pdf/1607.01582v1.pdf | |
PWC | https://paperswithcode.com/paper/bagged-boosted-trees-for-classification-of |
Repo | |
Framework | |
Random Projection Estimation of Discrete-Choice Models with Large Choice Sets
Title | Random Projection Estimation of Discrete-Choice Models with Large Choice Sets |
Authors | Khai X. Chiong, Matthew Shum |
Abstract | We introduce sparse random projection, an important dimension-reduction tool from machine learning, for the estimation of discrete-choice models with high-dimensional choice sets. Initially, high-dimensional data are compressed into a lower-dimensional Euclidean space using random projections. Subsequently, estimation proceeds using cyclic monotonicity moment inequalities implied by the multinomial choice model; the estimation procedure is semi-parametric and does not require explicit distributional assumptions to be made regarding the random utility errors. The random projection procedure is justified via the Johnson-Lindenstrauss Lemma – the pairwise distances between data points are preserved during data compression, which we exploit to show convergence of our estimator. The estimator works well in simulations and in an application to a supermarket scanner dataset. |
Tasks | Dimensionality Reduction |
Published | 2016-04-20 |
URL | http://arxiv.org/abs/1604.06036v1 |
http://arxiv.org/pdf/1604.06036v1.pdf | |
PWC | https://paperswithcode.com/paper/random-projection-estimation-of-discrete |
Repo | |
Framework | |
Fast deterministic tourist walk for texture analysis
Title | Fast deterministic tourist walk for texture analysis |
Authors | Lucas Correia Ribas, Odemir Martinez Bruno |
Abstract | Deterministic tourist walk (DTW) has attracted increasing interest in computer vision. In the last years, different methods for analysis of dynamic and static textures were proposed. So far, all works based on the DTW for texture analysis use all image pixels as initial point of a walk. However, this requires much runtime. In this paper, we conducted a study to verify the performance of the DTW method according to the number of initial points to start a walk. The proposed method assigns a unique code to each image pixel, then, the pixels whose code is not divisible by a given $k$ value are ignored as initial points of walks. Feature vectors were extracted and a classification process was performed for different percentages of initial points. Experimental results on the Brodatz and Vistex datasets indicate that to use fewer pixels as initial points significantly improves the runtime compared to use all image pixels. In addition, the correct classification rate decreases very little. |
Tasks | Texture Classification |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08624v1 |
http://arxiv.org/pdf/1611.08624v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-deterministic-tourist-walk-for-texture |
Repo | |
Framework | |
Learning Detailed Face Reconstruction from a Single Image
Title | Learning Detailed Face Reconstruction from a Single Image |
Authors | Elad Richardson, Matan Sela, Roy Or-El, Ron Kimmel |
Abstract | Reconstructing the detailed geometric structure of a face from a given image is a key to many computer vision and graphics applications, such as motion capture and reenactment. The reconstruction task is challenging as human faces vary extensively when considering expressions, poses, textures, and intrinsic geometries. While many approaches tackle this complexity by using additional data to reconstruct the face of a single subject, extracting facial surface from a single image remains a difficult problem. As a result, single-image based methods can usually provide only a rough estimate of the facial geometry. In contrast, we propose to leverage the power of convolutional neural networks to produce a highly detailed face reconstruction from a single image. For this purpose, we introduce an end-to-end CNN framework which derives the shape in a coarse-to-fine fashion. The proposed architecture is composed of two main blocks, a network that recovers the coarse facial geometry (CoarseNet), followed by a CNN that refines the facial features of that geometry (FineNet). The proposed networks are connected by a novel layer which renders a depth image given a mesh in 3D. Unlike object recognition and detection problems, there are no suitable datasets for training CNNs to perform face geometry reconstruction. Therefore, our training regime begins with a supervised phase, based on synthetic images, followed by an unsupervised phase that uses only unconstrained facial images. The accuracy and robustness of the proposed model is demonstrated by both qualitative and quantitative evaluation tests. |
Tasks | Face Reconstruction, Motion Capture, Object Recognition |
Published | 2016-11-15 |
URL | http://arxiv.org/abs/1611.05053v2 |
http://arxiv.org/pdf/1611.05053v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-detailed-face-reconstruction-from-a |
Repo | |
Framework | |
Design and development a children’s speech database
Title | Design and development a children’s speech database |
Authors | Radoslava Kraleva |
Abstract | The report presents the process of planning, designing and the development of a database of spoken children’s speech whose native language is Bulgarian. The proposed model is designed for children between the age of 4 and 6 without speech disorders, and reflects their specific capabilities. At this age most children cannot read, there is no sustained concentration, they are emotional, etc. The aim is to unite all the media information accompanying the recording and processing of spoken speech, thereby to facilitate the work of researchers in the field of speech recognition. This database will be used for the development of systems for children’s speech recognition, children’s speech synthesis systems, games which allow voice control, etc. As a result of the proposed model a prototype system for speech recognition is presented. |
Tasks | Speech Recognition, Speech Synthesis |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07735v1 |
http://arxiv.org/pdf/1605.07735v1.pdf | |
PWC | https://paperswithcode.com/paper/design-and-development-a-childrens-speech |
Repo | |
Framework | |