July 29, 2019

2695 words 13 mins read

Paper Group AWR 164

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. Enhanced Neural Machine Translation by Learning from Draft. Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting. A Generative Model of People in Clothing. LR-GAN: Layered Recursive Generative Adversarial Networks for Image Genera …

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples


Title	Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples
Authors	Gail Weiss, Yoav Goldberg, Eran Yahav
Abstract	We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin’s L* algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.
Tasks
Published	2017-11-27
URL	https://arxiv.org/abs/1711.09576v4
PDF	https://arxiv.org/pdf/1711.09576v4.pdf
PWC	https://paperswithcode.com/paper/extracting-automata-from-recurrent-neural
Repo	https://github.com/tech-srl/lstar_extraction
Framework	none

Enhanced Neural Machine Translation by Learning from Draft


Title	Enhanced Neural Machine Translation by Learning from Draft
Authors	Aodong Li, Shiyue Zhang, Dong Wang, Thomas Fang Zheng
Abstract	Neural machine translation (NMT) has recently achieved impressive results. A potential problem of the existing NMT algorithm, however, is that the decoding is conducted from left to right, without considering the right context. This paper proposes an two-stage approach to solve the problem. In the first stage, a conventional attention-based NMT system is used to produce a draft translation, and in the second stage, a novel double-attention NMT system is used to refine the translation, by looking at the original input as well as the draft translation. This drafting-and-refinement can obtain the right-context information from the draft, hence producing more consistent translations. We evaluated this approach using two Chinese-English translation tasks, one with 44k pairs and 1M pairs respectively. The experiments showed that our approach achieved positive improvements over the conventional NMT system: the improvements are 2.4 and 0.9 BLEU points on the small-scale and large-scale tasks, respectively.
Tasks	Machine Translation
Published	2017-10-04
URL	http://arxiv.org/abs/1710.01789v1
PDF	http://arxiv.org/pdf/1710.01789v1.pdf
PWC	https://paperswithcode.com/paper/enhanced-neural-machine-translation-by
Repo	https://github.com/aodongli/Learning_from_draft
Framework	tf

Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting


Title	Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting
Authors	Raphael Tang, Jimmy Lin
Abstract	We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow. These models are useful for recognizing “command triggers” in speech-based interfaces (e.g., “Hey Siri”), which serve as explicit cues for audio recordings of utterances that are sent to the cloud for full speech recognition. Evaluation on Google’s recently released Speech Commands Dataset shows that our reimplementation is comparable in accuracy and provides a starting point for future work on the keyword spotting task.
Tasks	Keyword Spotting, Speech Recognition
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06554v2
PDF	http://arxiv.org/pdf/1710.06554v2.pdf
PWC	https://paperswithcode.com/paper/honk-a-pytorch-reimplementation-of
Repo	https://github.com/etosworld/etos-keywordspotting
Framework	tf

A Generative Model of People in Clothing


Title	A Generative Model of People in Clothing
Authors	Christoph Lassner, Gerard Pons-Moll, Peter V. Gehler
Abstract	We present the first image-based generative model of people in clothing for the full body. We sidestep the commonly used complex graphics rendering pipeline and the need for high-quality 3D scans of dressed people. Instead, we learn generative models from a large image database. The main challenge is to cope with the high variance in human pose, shape and appearance. For this reason, pure image-based approaches have not been considered so far. We show that this challenge can be overcome by splitting the generating process in two parts. First, we learn to generate a semantic segmentation of the body and clothing. Second, we learn a conditional model on the resulting segments that creates realistic images. The full model is differentiable and can be conditioned on pose, shape or color. The result are samples of people in different clothing items and styles. The proposed model can generate entirely new people with realistic clothing. In several experiments we present encouraging results that suggest an entirely data-driven approach to people generation is possible.
Tasks	Semantic Segmentation
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04098v3
PDF	http://arxiv.org/pdf/1705.04098v3.pdf
PWC	https://paperswithcode.com/paper/a-generative-model-of-people-in-clothing
Repo	https://github.com/classner/generating_people
Framework	tf

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation


Title	LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation
Authors	Jianwei Yang, Anitha Kannan, Dhruv Batra, Devi Parikh
Abstract	We present LR-GAN: an adversarial image generation model which takes scene structure and context into account. Unlike previous generative adversarial networks (GANs), the proposed GAN learns to generate image background and foregrounds separately and recursively, and stitch the foregrounds on the background in a contextually relevant manner to produce a complete natural image. For each foreground, the model learns to generate its appearance, shape and pose. The whole model is unsupervised, and is trained in an end-to-end manner with gradient descent methods. The experiments demonstrate that LR-GAN can generate more natural images with objects that are more human recognizable than DCGAN.
Tasks	Image Generation
Published	2017-03-05
URL	http://arxiv.org/abs/1703.01560v3
PDF	http://arxiv.org/pdf/1703.01560v3.pdf
PWC	https://paperswithcode.com/paper/lr-gan-layered-recursive-generative
Repo	https://github.com/jwyang/lr-gan.pytorch
Framework	pytorch

ParlAI: A Dialog Research Software Platform


Title	ParlAI: A Dialog Research Software Platform
Authors	Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, Antoine Bordes, Devi Parikh, Jason Weston
Abstract	We introduce ParlAI (pronounced “par-lay”), an open-source software platform for dialog research implemented in Python, available at http://parl.ai. Its goal is to provide a unified framework for sharing, training and testing of dialog models, integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning; and a repository of machine learning models for comparing with others’ models, and improving upon existing architectures. Over 20 tasks are supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Several models are integrated, including neural models such as memory networks, seq2seq and attentive LSTMs.
Tasks	Visual Question Answering
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06476v4
PDF	http://arxiv.org/pdf/1705.06476v4.pdf
PWC	https://paperswithcode.com/paper/parlai-a-dialog-research-software-platform
Repo	https://github.com/kdexd/lang-emerge-parlai
Framework	pytorch

Sketching Word Vectors Through Hashing


Title	Sketching Word Vectors Through Hashing
Authors	Behrang QasemiZadeh, Laura Kallmeyer
Abstract	We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections: By disregarding the classic constraint used in designing random projections (i.e., preserving pairwise distances in a particular normed space), our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed method can achieve competitive results, comparable to neural embedding learning techniques, however, with only a fraction of the computational complexity of these methods. While the proposed derandomization enhances the computational and space complexity of our method, the possibility of applying weighting methods such as positive pointwise mutual information (PPMI) to our models after their construction (and at a reduced dimensionality) imparts a high discriminatory power to the resulting embeddings. Obviously, this method comes with other known benefits of random projection-based techniques such as ease of update.
Tasks
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04253v2
PDF	http://arxiv.org/pdf/1705.04253v2.pdf
PWC	https://paperswithcode.com/paper/sketching-word-vectors-through-hashing
Repo	https://github.com/languagerecipes/LPCFG_Unsupervised_Frame_Induction
Framework	none

Crowdsourcing Ground Truth for Medical Relation Extraction


Title	Crowdsourcing Ground Truth for Medical Relation Extraction
Authors	Anca Dumitrache, Lora Aroyo, Chris Welty
Abstract	Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, that reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the $cause$ and $treat$ relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, that account for ambiguity in both human and machine performance on this task.
Tasks	Medical Relation Extraction, Relation Extraction
Published	2017-01-09
URL	http://arxiv.org/abs/1701.02185v2
PDF	http://arxiv.org/pdf/1701.02185v2.pdf
PWC	https://paperswithcode.com/paper/crowdsourcing-ground-truth-for-medical
Repo	https://github.com/CrowdTruth/Medical-Relation-Extraction
Framework	none

Stochastic Subsampling for Factorizing Huge Matrices


Title	Stochastic Subsampling for Factorizing Huge Matrices
Authors	Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux
Abstract	We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and non-negative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrix-factorization problem. We demonstrate its efficiency on massive functional Magnetic Resonance Imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speed-ups compared to state-of-the-art algorithms.
Tasks	Dictionary Learning
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05363v3
PDF	http://arxiv.org/pdf/1701.05363v3.pdf
PWC	https://paperswithcode.com/paper/stochastic-subsampling-for-factorizing-huge
Repo	https://github.com/arthurmensch/modl
Framework	none

3D Object Reconstruction from a Single Depth View with Adversarial Learning


Title	3D Object Reconstruction from a Single Depth View with Adversarial Learning
Authors	Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, Niki Trigoni
Abstract	In this paper, we propose a novel 3D-RecGAN approach, which reconstructs the complete 3D structure of a given object from a single arbitrary depth view using generative adversarial networks. Unlike the existing work which typically requires multiple views of the same object or class labels to recover the full 3D geometry, the proposed 3D-RecGAN only takes the voxel grid representation of a depth view of the object as input, and is able to generate the complete 3D occupancy grid by filling in the occluded/missing regions. The key idea is to combine the generative capabilities of autoencoders and the conditional Generative Adversarial Networks (GAN) framework, to infer accurate and fine-grained 3D structures of objects in high-dimensional voxel space. Extensive experiments on large synthetic datasets show that the proposed 3D-RecGAN significantly outperforms the state of the art in single view 3D object reconstruction, and is able to reconstruct unseen types of objects. Our code and data are available at: https://github.com/Yang7879/3D-RecGAN.
Tasks	3D Object Reconstruction, Object Reconstruction
Published	2017-08-26
URL	http://arxiv.org/abs/1708.07969v1
PDF	http://arxiv.org/pdf/1708.07969v1.pdf
PWC	https://paperswithcode.com/paper/3d-object-reconstruction-from-a-single-depth
Repo	https://github.com/Yang7879/3D-RecGAN
Framework	tf

Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data


Title	Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data
Authors	Mohammad Aliannejadi, Masoud Kiaeeha, Shahram Khadivi, Saeed Shiry Ghidary
Abstract	We experiment graph-based Semi-Supervised Learning (SSL) of Conditional Random Fields (CRF) for the application of Spoken Language Understanding (SLU) on unaligned data. The aligned labels for examples are obtained using IBM Model. We adapt a baseline semi-supervised CRF by defining new feature set and altering the label propagation algorithm. Our results demonstrate that our proposed approach significantly improves the performance of the supervised model by utilizing the knowledge gained from the graph.
Tasks	Spoken Language Understanding
Published	2017-01-30
URL	http://arxiv.org/abs/1701.08533v1
PDF	http://arxiv.org/pdf/1701.08533v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-semi-supervised-conditional
Repo	https://github.com/maxxkia/g-ssl-crf
Framework	none

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM


Title	Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
Authors	Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan
Abstract	We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions, the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02737v1
PDF	http://arxiv.org/pdf/1706.02737v1.pdf
PWC	https://paperswithcode.com/paper/advances-in-joint-ctc-attention-based-end-to
Repo	https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch
Framework	pytorch

A Genetic Programming Approach to Designing Convolutional Neural Network Architectures


Title	A Genetic Programming Approach to Designing Convolutional Neural Network Architectures
Authors	Masanori Suganuma, Shinichi Shirakawa, Tomoharu Nagao
Abstract	The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety of computer vision tasks. However, designing CNN architectures still requires expert knowledge and a lot of trial and error. In this paper, we attempt to automatically construct CNN architectures for an image classification task based on Cartesian genetic programming (CGP). In our method, we adopt highly functional modules, such as convolutional blocks and tensor concatenation, as the node functions in CGP. The CNN structure and connectivity represented by the CGP encoding method are optimized to maximize the validation accuracy. To evaluate the proposed method, we constructed a CNN architecture for the image classification task with the CIFAR-10 dataset. The experimental result shows that the proposed method can be used to automatically find the competitive CNN architecture compared with state-of-the-art models.
Tasks	Image Classification, Neural Architecture Search
Published	2017-04-03
URL	http://arxiv.org/abs/1704.00764v2
PDF	http://arxiv.org/pdf/1704.00764v2.pdf
PWC	https://paperswithcode.com/paper/a-genetic-programming-approach-to-designing
Repo	https://github.com/sg-nm/cgp-cnn
Framework	pytorch

Deep Burst Denoising


Title	Deep Burst Denoising
Authors	Clément Godard, Kevin Matzen, Matt Uyttendaele
Abstract	Noise is an inherent issue of low-light image capture, one which is exacerbated on mobile devices due to their narrow apertures and small sensors. One strategy for mitigating noise in a low-light situation is to increase the shutter time of the camera, thus allowing each photosite to integrate more light and decrease noise variance. However, there are two downsides of long exposures: (a) bright regions can exceed the sensor range, and (b) camera and scene motion will result in blurred images. Another way of gathering more light is to capture multiple short (thus noisy) frames in a “burst” and intelligently integrate the content, thus avoiding the above downsides. In this paper, we use the burst-capture strategy and implement the intelligent integration via a recurrent fully convolutional deep neural net (CNN). We build our novel, multiframe architecture to be a simple addition to any single frame denoising model, and design to handle an arbitrary number of noisy input frames. We show that it achieves state of the art denoising results on our burst dataset, improving on the best published multi-frame techniques, such as VBM4D and FlexISP. Finally, we explore other applications of image enhancement by integrating content from multiple frames and demonstrate that our DNN architecture generalizes well to image super-resolution.
Tasks	Denoising, Image Denoising, Image Enhancement, Image Super-Resolution, Super-Resolution
Published	2017-12-15
URL	http://arxiv.org/abs/1712.05790v1
PDF	http://arxiv.org/pdf/1712.05790v1.pdf
PWC	https://paperswithcode.com/paper/deep-burst-denoising
Repo	https://github.com/Ourshanabi/Burst-denoising
Framework	pytorch

Deep Unsupervised Similarity Learning using Partially Ordered Sets


Title	Deep Unsupervised Similarity Learning using Partially Ordered Sets
Authors	Miguel A Bautista, Artsiom Sanakoyeu, Björn Ommer
Abstract	Unsupervised learning of visual similarities is of paramount importance to computer vision, particularly due to lacking training data for fine-grained similarities. Deep learning of similarities is often based on relationships between pairs or triplets of samples. Many of these relations are unreliable and mutually contradicting, implying inconsistencies when trained without supervision information that relates different tuples or triplets to each other. To overcome this problem, we use local estimates of reliable (dis-)similarities to initially group samples into compact surrogate classes and use local partial orders of samples to classes to link classes to each other. Similarity learning is then formulated as a partial ordering task with soft correspondences of all samples to classes. Adopting a strategy of self-supervision, a CNN is trained to optimally represent samples in a mutually consistent manner while updating the classes. The similarity learning and grouping procedure are integrated in a single model and optimized jointly. The proposed unsupervised approach shows competitive performance on detailed pose estimation and object classification.
Tasks	Object Classification, Pose Estimation
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02268v3
PDF	http://arxiv.org/pdf/1704.02268v3.pdf
PWC	https://paperswithcode.com/paper/deep-unsupervised-similarity-learning-using
Repo	https://github.com/asanakoy/deeppose_tf
Framework	tf