July 29, 2019

2695 words 13 mins read

Paper Group AWR 164

Paper Group AWR 164

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. Enhanced Neural Machine Translation by Learning from Draft. Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting. A Generative Model of People in Clothing. LR-GAN: Layered Recursive Generative Adversarial Networks for Image Genera …

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples

Title Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples
Authors Gail Weiss, Yoav Goldberg, Eran Yahav
Abstract We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin’s L* algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.
Tasks
Published 2017-11-27
URL https://arxiv.org/abs/1711.09576v4
PDF https://arxiv.org/pdf/1711.09576v4.pdf
PWC https://paperswithcode.com/paper/extracting-automata-from-recurrent-neural
Repo https://github.com/tech-srl/lstar_extraction
Framework none

Enhanced Neural Machine Translation by Learning from Draft

Title Enhanced Neural Machine Translation by Learning from Draft
Authors Aodong Li, Shiyue Zhang, Dong Wang, Thomas Fang Zheng
Abstract Neural machine translation (NMT) has recently achieved impressive results. A potential problem of the existing NMT algorithm, however, is that the decoding is conducted from left to right, without considering the right context. This paper proposes an two-stage approach to solve the problem. In the first stage, a conventional attention-based NMT system is used to produce a draft translation, and in the second stage, a novel double-attention NMT system is used to refine the translation, by looking at the original input as well as the draft translation. This drafting-and-refinement can obtain the right-context information from the draft, hence producing more consistent translations. We evaluated this approach using two Chinese-English translation tasks, one with 44k pairs and 1M pairs respectively. The experiments showed that our approach achieved positive improvements over the conventional NMT system: the improvements are 2.4 and 0.9 BLEU points on the small-scale and large-scale tasks, respectively.
Tasks Machine Translation
Published 2017-10-04
URL http://arxiv.org/abs/1710.01789v1
PDF http://arxiv.org/pdf/1710.01789v1.pdf
PWC https://paperswithcode.com/paper/enhanced-neural-machine-translation-by
Repo https://github.com/aodongli/Learning_from_draft
Framework tf

Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting

Title Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting
Authors Raphael Tang, Jimmy Lin
Abstract We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow. These models are useful for recognizing “command triggers” in speech-based interfaces (e.g., “Hey Siri”), which serve as explicit cues for audio recordings of utterances that are sent to the cloud for full speech recognition. Evaluation on Google’s recently released Speech Commands Dataset shows that our reimplementation is comparable in accuracy and provides a starting point for future work on the keyword spotting task.
Tasks Keyword Spotting, Speech Recognition
Published 2017-10-18
URL http://arxiv.org/abs/1710.06554v2
PDF http://arxiv.org/pdf/1710.06554v2.pdf
PWC https://paperswithcode.com/paper/honk-a-pytorch-reimplementation-of
Repo https://github.com/etosworld/etos-keywordspotting
Framework tf

A Generative Model of People in Clothing

Title A Generative Model of People in Clothing
Authors Christoph Lassner, Gerard Pons-Moll, Peter V. Gehler
Abstract We present the first image-based generative model of people in clothing for the full body. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pose, shape and appearance. For this reason, pure image-based approaches have not been considered so far. We show that this challenge can be overcome by splitting the generating process in two parts. First, we learn to generate a semantic segmentation of the body and clothing. Second, we learn a conditional model on the resulting segments that creates realistic images. The full model is differentiable and can be conditioned on pose, shape or color. The result are samples of people in different clothing items and styles. The proposed model can generate entirely new people with realistic clothing. In several experiments we present encouraging results that suggest an entirely data-driven approach to people generation is possible.
Tasks Semantic Segmentation
Published 2017-05-11
URL http://arxiv.org/abs/1705.04098v3
PDF http://arxiv.org/pdf/1705.04098v3.pdf
PWC https://paperswithcode.com/paper/a-generative-model-of-people-in-clothing
Repo https://github.com/classner/generating_people
Framework tf

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

Title LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation
Authors Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh
Abstract We present LR-GAN: an adversarial image generation model which takes scene structure and context into account. Unlike previous generative adversarial networks (GANs), the proposed GAN learns to generate image background and foregrounds separately and recursively, and stitch the foregrounds on the background in a contextually relevant manner to produce a complete natural image. For each foreground, the model learns to generate its appearance, shape and pose. The whole model is unsupervised, and is trained in an end-to-end manner with gradient descent methods. The experiments demonstrate that LR-GAN can generate more natural images with objects that are more human recognizable than DCGAN.
Tasks Image Generation
Published 2017-03-05
URL http://arxiv.org/abs/1703.01560v3
PDF http://arxiv.org/pdf/1703.01560v3.pdf
PWC https://paperswithcode.com/paper/lr-gan-layered-recursive-generative
Repo https://github.com/jwyang/lr-gan.pytorch
Framework pytorch

ParlAI: A Dialog Research Software Platform

Title ParlAI: A Dialog Research Software Platform
Authors Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston
Abstract We introduce ParlAI (pronounced “par-lay”), an open-source software platform for dialog research implemented in Python, available at http://parl.ai. Its goal is to provide a unified framework for sharing, training and testing of dialog models, integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning; and a repository of machine learning models for comparing with others’ models, and improving upon existing architectures. Over 20 tasks are supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Several models are integrated, including neural models such as memory networks, seq2seq and attentive LSTMs.
Tasks Visual Question Answering
Published 2017-05-18
URL http://arxiv.org/abs/1705.06476v4
PDF http://arxiv.org/pdf/1705.06476v4.pdf
PWC https://paperswithcode.com/paper/parlai-a-dialog-research-software-platform
Repo https://github.com/kdexd/lang-emerge-parlai
Framework pytorch

Sketching Word Vectors Through Hashing

Title Sketching Word Vectors Through Hashing
Authors Behrang QasemiZadeh, Laura Kallmeyer
Abstract We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections: By disregarding the classic constraint used in designing random projections (i.e., preserving pairwise distances in a particular normed space), our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed method can achieve competitive results, comparable to neural embedding learning techniques, however, with only a fraction of the computational complexity of these methods. While the proposed derandomization enhances the computational and space complexity of our method, the possibility of applying weighting methods such as positive pointwise mutual information (PPMI) to our models after their construction (and at a reduced dimensionality) imparts a high discriminatory power to the resulting embeddings. Obviously, this method comes with other known benefits of random projection-based techniques such as ease of update.
Tasks
Published 2017-05-11
URL http://arxiv.org/abs/1705.04253v2
PDF http://arxiv.org/pdf/1705.04253v2.pdf
PWC https://paperswithcode.com/paper/sketching-word-vectors-through-hashing
Repo https://github.com/languagerecipes/LPCFG_Unsupervised_Frame_Induction
Framework none

Crowdsourcing Ground Truth for Medical Relation Extraction

Title Crowdsourcing Ground Truth for Medical Relation Extraction
Authors Anca Dumitrache, Lora Aroyo, Chris Welty
Abstract Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, that reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the $cause$ and $treat$ relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, that account for ambiguity in both human and machine performance on this task.
Tasks Medical Relation Extraction, Relation Extraction
Published 2017-01-09
URL http://arxiv.org/abs/1701.02185v2
PDF http://arxiv.org/pdf/1701.02185v2.pdf
PWC https://paperswithcode.com/paper/crowdsourcing-ground-truth-for-medical
Repo https://github.com/CrowdTruth/Medical-Relation-Extraction
Framework none

Stochastic Subsampling for Factorizing Huge Matrices

Title Stochastic Subsampling for Factorizing Huge Matrices
Authors Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux
Abstract We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and non-negative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrix-factorization problem. We demonstrate its efficiency on massive functional Magnetic Resonance Imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speed-ups compared to state-of-the-art algorithms.
Tasks Dictionary Learning
Published 2017-01-19
URL http://arxiv.org/abs/1701.05363v3
PDF http://arxiv.org/pdf/1701.05363v3.pdf
PWC https://paperswithcode.com/paper/stochastic-subsampling-for-factorizing-huge
Repo https://github.com/arthurmensch/modl
Framework none

3D Object Reconstruction from a Single Depth View with Adversarial Learning

Title 3D Object Reconstruction from a Single Depth View with Adversarial Learning
Authors Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, Niki Trigoni
Abstract In this paper, we propose a novel 3D-RecGAN approach, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike the existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid by filling in the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets show that the proposed 3D-RecGAN significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects. Our code and data are available at: https://github.com/Yang7879/3D-RecGAN.
Tasks 3D Object Reconstruction, Object Reconstruction
Published 2017-08-26
URL http://arxiv.org/abs/1708.07969v1
PDF http://arxiv.org/pdf/1708.07969v1.pdf
PWC https://paperswithcode.com/paper/3d-object-reconstruction-from-a-single-depth
Repo https://github.com/Yang7879/3D-RecGAN
Framework tf

Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data

Title Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data
Authors Mohammad Aliannejadi, Masoud Kiaeeha, Shahram Khadivi, Saeed Shiry Ghidary
Abstract We experiment graph-based Semi-Supervised Learning (SSL) of Conditional Random Fields (CRF) for the application of Spoken Language Understanding (SLU) on unaligned data. The aligned labels for examples are obtained using IBM Model. We adapt a baseline semi-supervised CRF by defining new feature set and altering the label propagation algorithm. Our results demonstrate that our proposed approach significantly improves the performance of the supervised model by utilizing the knowledge gained from the graph.
Tasks Spoken Language Understanding
Published 2017-01-30
URL http://arxiv.org/abs/1701.08533v1
PDF http://arxiv.org/pdf/1701.08533v1.pdf
PWC https://paperswithcode.com/paper/graph-based-semi-supervised-conditional
Repo https://github.com/maxxkia/g-ssl-crf
Framework none

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Title Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
Authors Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan
Abstract We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.
Tasks End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published 2017-06-08
URL http://arxiv.org/abs/1706.02737v1
PDF http://arxiv.org/pdf/1706.02737v1.pdf
PWC https://paperswithcode.com/paper/advances-in-joint-ctc-attention-based-end-to
Repo https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch
Framework pytorch

A Genetic Programming Approach to Designing Convolutional Neural Network Architectures

Title A Genetic Programming Approach to Designing Convolutional Neural Network Architectures
Authors Masanori Suganuma, Shinichi Shirakawa, Tomoharu Nagao
Abstract The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety of computer vision tasks. However, designing CNN architectures still requires expert knowledge and a lot of trial and error. In this paper, we attempt to automatically construct CNN architectures for an image classification task based on Cartesian genetic programming (CGP). In our method, we adopt highly functional modules, such as convolutional blocks and tensor concatenation, as the node functions in CGP. The CNN structure and connectivity represented by the CGP encoding method are optimized to maximize the validation accuracy. To evaluate the proposed method, we constructed a CNN architecture for the image classification task with the CIFAR-10 dataset. The experimental result shows that the proposed method can be used to automatically find the competitive CNN architecture compared with state-of-the-art models.
Tasks Image Classification, Neural Architecture Search
Published 2017-04-03
URL http://arxiv.org/abs/1704.00764v2
PDF http://arxiv.org/pdf/1704.00764v2.pdf
PWC https://paperswithcode.com/paper/a-genetic-programming-approach-to-designing
Repo https://github.com/sg-nm/cgp-cnn
Framework pytorch

Deep Burst Denoising

Title Deep Burst Denoising
Authors Clément Godard, Kevin Matzen, Matt Uyttendaele
Abstract Noise is an inherent issue of low-light image capture, one which is exacerbated on mobile devices due to their narrow apertures and small sensors. One strategy for mitigating noise in a low-light situation is to increase the shutter time of the camera, thus allowing each photosite to integrate more light and decrease noise variance. However, there are two downsides of long exposures: (a) bright regions can exceed the sensor range, and (b) camera and scene motion will result in blurred images. Another way of gathering more light is to capture multiple short (thus noisy) frames in a “burst” and intelligently integrate the content, thus avoiding the above downsides. In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). We build our novel, multiframe architecture to be a simple addition to any single frame denoising model, and design to handle an arbitrary number of noisy input frames. We show that it achieves state of the art denoising results on our burst dataset, improving on the best published multi-frame techniques, such as VBM4D and FlexISP. Finally, we explore other applications of image enhancement by integrating content from multiple frames and demonstrate that our DNN architecture generalizes well to image super-resolution.
Tasks Denoising, Image Denoising, Image Enhancement, Image Super-Resolution, Super-Resolution
Published 2017-12-15
URL http://arxiv.org/abs/1712.05790v1
PDF http://arxiv.org/pdf/1712.05790v1.pdf
PWC https://paperswithcode.com/paper/deep-burst-denoising
Repo https://github.com/Ourshanabi/Burst-denoising
Framework pytorch

Deep Unsupervised Similarity Learning using Partially Ordered Sets

Title Deep Unsupervised Similarity Learning using Partially Ordered Sets
Authors Miguel A Bautista, Artsiom Sanakoyeu, Björn Ommer
Abstract Unsupervised learning of visual similarities is of paramount importance to computer vision, particularly due to lacking training data for fine-grained similarities. Deep learning of similarities is often based on relationships between pairs or triplets of samples. Many of these relations are unreliable and mutually contradicting, implying inconsistencies when trained without supervision information that relates different tuples or triplets to each other. To overcome this problem, we use local estimates of reliable (dis-)similarities to initially group samples into compact surrogate classes and use local partial orders of samples to classes to link classes to each other. Similarity learning is then formulated as a partial ordering task with soft correspondences of all samples to classes. Adopting a strategy of self-supervision, a CNN is trained to optimally represent samples in a mutually consistent manner while updating the classes. The similarity learning and grouping procedure are integrated in a single model and optimized jointly. The proposed unsupervised approach shows competitive performance on detailed pose estimation and object classification.
Tasks Object Classification, Pose Estimation
Published 2017-04-07
URL http://arxiv.org/abs/1704.02268v3
PDF http://arxiv.org/pdf/1704.02268v3.pdf
PWC https://paperswithcode.com/paper/deep-unsupervised-similarity-learning-using
Repo https://github.com/asanakoy/deeppose_tf
Framework tf
comments powered by Disqus