July 30, 2019

2867 words 14 mins read

Paper Group AWR 49

Bidirectional Attention for SQL Generation. Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net. A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing. Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection. Contextualized Word Representations for …

Bidirectional Attention for SQL Generation


Title	Bidirectional Attention for SQL Generation
Authors	Tong Guo, Huilin Gao
Abstract	Generating structural query language (SQL) queries from natural language is a long-standing open problem. Answering a natural language question about a database table requires modeling complex interactions between the columns of the table and the question. In this paper, we apply the synthesizing approach to solve this problem. Based on the structure of SQL queries, we break down the model to three sub-modules and design specific deep neural networks for each of them. Taking inspiration from the similar machine reading task, we employ the bidirectional attention mechanisms and character-level embedding with convolutional neural networks (CNNs) to improve the result. Experimental evaluations show that our model achieves the state-of-the-art results in WikiSQL dataset.
Tasks	Reading Comprehension
Published	2017-12-30
URL	http://arxiv.org/abs/1801.00076v6
PDF	http://arxiv.org/pdf/1801.00076v6.pdf
PWC	https://paperswithcode.com/paper/bidirectional-attention-for-sql-generation
Repo	https://github.com/guotong1988/NL2SQL
Framework	pytorch

Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net


Title	Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net
Authors	Tom Michoel
Abstract	The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there has been limited success in deriving estimates for the full posterior distribution of regression coefficients in these models, due to a need to evaluate analytically intractable partition function integrals. Here, the Fourier transform is used to express these integrals as complex-valued oscillatory integrals over “regression frequencies”. This results in an analytic expansion and stationary phase approximation for the partition functions of the Bayesian lasso and elastic net, where the non-differentiability of the double-exponential prior has so far eluded such an approach. Use of this approximation leads to highly accurate numerical estimates for the expectation values and marginal posterior distributions of the regression coefficients, and allows for Bayesian inference of much higher dimensional models than previously possible.
Tasks	Bayesian Inference
Published	2017-09-25
URL	http://arxiv.org/abs/1709.08535v3
PDF	http://arxiv.org/pdf/1709.08535v3.pdf
PWC	https://paperswithcode.com/paper/analytic-solution-and-stationary-phase
Repo	https://github.com/tmichoel/bayonet
Framework	none

A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing


Title	A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing
Authors	Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, David Wipf
Abstract	This paper proposes a deep neural network structure that exploits edge information in addressing representative low-level vision tasks such as layer separation and image filtering. Unlike most other deep learning strategies applied in this context, our approach tackles these challenging problems by estimating edges and reconstructing images using only cascaded convolutional layers arranged such that no handcrafted or application-specific image-processing components are required. We apply the resulting transferrable pipeline to two different problem domains that are both sensitive to edges, namely, single image reflection removal and image smoothing. For the former, using a mild reflection smoothness assumption and a novel synthetic data generation method that acts as a type of weak supervision, our network is able to solve much more difficult reflection cases that cannot be handled by previous methods. For the latter, we also exceed the state-of-the-art quantitative and qualitative results by wide margins. In all cases, the proposed framework is simple, fast, and easy to transfer across disparate domains.
Tasks	Synthetic Data Generation
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03474v2
PDF	http://arxiv.org/pdf/1708.03474v2.pdf
PWC	https://paperswithcode.com/paper/a-generic-deep-architecture-for-single-image
Repo	https://github.com/fqnchina/CEILNet
Framework	torch

Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection


Title	Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection
Authors	Ludovic Trottier, Philippe Giguère, Brahim Chaib-draa
Abstract	Convolutional neural networks (CNNs) have become the most successful approach in many vision-related domains. However, they are limited to domains where data is abundant. Recent works have looked at multi-task learning (MTL) to mitigate data scarcity by leveraging domain-specific information from related tasks. In this paper, we present a novel soft-parameter sharing mechanism for CNNs in a MTL setting, which we refer to as Deep Collaboration. We propose taking into account the notion that task relevance depends on depth by using lateral transformation blocs with skip connections. This allows extracting task-specific features at various depth without sacrificing features relevant to all tasks. We show that CNNs connected with our Deep Collaboration obtain better accuracy on facial landmark detection with related tasks. We finally verify that our approach effectively allows knowledge sharing by showing depth-specific influence of tasks that we know are related.
Tasks	Facial Landmark Detection, Multi-Task Learning
Published	2017-10-28
URL	http://arxiv.org/abs/1711.00111v2
PDF	http://arxiv.org/pdf/1711.00111v2.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-by-deep-collaboration-and
Repo	https://github.com/ltrottier/deep-collaboration-network
Framework	pytorch

Contextualized Word Representations for Reading Comprehension


Title	Contextualized Word Representations for Reading Comprehension
Authors	Shimi Salant, Jonathan Berant
Abstract	Reading a document and extracting an answer to a question about its content has attracted substantial attention recently. While most work has focused on the interaction between the question and the document, in this work we evaluate the importance of context when the question and document are processed independently. We take a standard neural architecture for this task, and show that by providing rich contextualized word representations from a large pre-trained language model as well as allowing the model to choose between context-dependent and context-independent word representations, we can obtain dramatic improvements and reach performance comparable to state-of-the-art on the competitive SQuAD dataset.
Tasks	Language Modelling, Question Answering, Reading Comprehension
Published	2017-12-10
URL	http://arxiv.org/abs/1712.03609v4
PDF	http://arxiv.org/pdf/1712.03609v4.pdf
PWC	https://paperswithcode.com/paper/contextualized-word-representations-for
Repo	https://github.com/shimisalant/CWR
Framework	tf

A Mixture of Matrix Variate Bilinear Factor Analyzers


Title	A Mixture of Matrix Variate Bilinear Factor Analyzers
Authors	Michael P. B. Gallaugher, Paul D. McNicholas
Abstract	Over the years data has become increasingly higher dimensional, which has prompted an increased need for dimension reduction techniques. This is perhaps especially true for clustering (unsupervised classification) as well as semi-supervised and supervised classification. Although dimension reduction in the area of clustering for multivariate data has been quite thoroughly discussed within the literature, there is relatively little work in the area of three-way, or matrix variate, data. Herein, we develop a mixture of matrix variate bilinear factor analyzers (MMVBFA) model for use in clustering high-dimensional matrix variate data. This work can be considered both the first matrix variate bilinear factor analysis model as well as the first MMVBFA model. Parameter estimation is discussed, and the MMVBFA model is illustrated using simulated and real data.
Tasks	Dimensionality Reduction
Published	2017-12-22
URL	http://arxiv.org/abs/1712.08664v3
PDF	http://arxiv.org/pdf/1712.08664v3.pdf
PWC	https://paperswithcode.com/paper/a-mixture-of-matrix-variate-bilinear-factor
Repo	https://github.com/nikpocuca/MatrixVariate.jl
Framework	none

Simple and Effective Multi-Paragraph Reading Comprehension


Title	Simple and Effective Multi-Paragraph Reading Comprehension
Authors	Christopher Clark, Matt Gardner
Abstract	We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system.
Tasks	Question Answering, Reading Comprehension
Published	2017-10-29
URL	http://arxiv.org/abs/1710.10723v2
PDF	http://arxiv.org/pdf/1710.10723v2.pdf
PWC	https://paperswithcode.com/paper/simple-and-effective-multi-paragraph-reading
Repo	https://github.com/allenai/document-qa
Framework	tf

Learned in Translation: Contextualized Word Vectors


Title	Learned in Translation: Contextualized Word Vectors
Authors	Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher
Abstract	Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.
Tasks	Machine Translation, Question Answering, Sentiment Analysis, Text Classification
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00107v2
PDF	http://arxiv.org/pdf/1708.00107v2.pdf
PWC	https://paperswithcode.com/paper/learned-in-translation-contextualized-word
Repo	https://github.com/menajosep/AleatoricSent
Framework	tf

Watset: Automatic Induction of Synsets from a Graph of Synonyms


Title	Watset: Automatic Induction of Synsets from a Graph of Synonyms
Authors	Dmitry Ustalov, Alexander Panchenko, Chris Biemann
Abstract	This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources.
Tasks	Word Embeddings, Word Sense Induction
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07157v1
PDF	http://arxiv.org/pdf/1704.07157v1.pdf
PWC	https://paperswithcode.com/paper/watset-automatic-induction-of-synsets-from-a
Repo	https://github.com/dustalov/watset
Framework	none

PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications


Title	PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Authors	Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma
Abstract	PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications.
Tasks	Image Generation
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05517v1
PDF	http://arxiv.org/pdf/1701.05517v1.pdf
PWC	https://paperswithcode.com/paper/pixelcnn-improving-the-pixelcnn-with
Repo	https://github.com/openai/pixel-cnn
Framework	tf

Learning Deep CNN Denoiser Prior for Image Restoration


Title	Learning Deep CNN Denoiser Prior for Image Restoration
Authors	Kai Zhang, Wangmeng Zuo, Shuhang Gu, Lei Zhang
Abstract	Model-based optimization methods and discriminative learning methods have been the two dominant strategies for solving various inverse problems in low-level vision. Typically, those two kinds of methods have their respective merits and drawbacks, e.g., model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming with sophisticated priors for the purpose of good performance; in the meanwhile, discriminative learning methods have fast testing speed but their application range is greatly restricted by the specialized task. Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e.g., deblurring). Such an integration induces considerable advantage when the denoiser is obtained via discriminative learning. However, the study of integration with fast discriminative denoiser prior is still lacking. To this end, this paper aims to train a set of fast and effective CNN (convolutional neural network) denoisers and integrate them into model-based optimization method to solve other inverse problems. Experimental results demonstrate that the learned set of denoisers not only achieve promising Gaussian denoising results but also can be used as prior to deliver good performance for various low-level vision applications.
Tasks	Deblurring, Denoising, Image Denoising, Image Restoration
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03264v1
PDF	http://arxiv.org/pdf/1704.03264v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-cnn-denoiser-prior-for-image
Repo	https://github.com/cszn/ircnn
Framework	none

Dual Path Networks


Title	Dual Path Networks
Authors	Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng
Abstract	In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications.
Tasks	Image Classification
Published	2017-07-06
URL	http://arxiv.org/abs/1707.01629v2
PDF	http://arxiv.org/pdf/1707.01629v2.pdf
PWC	https://paperswithcode.com/paper/dual-path-networks
Repo	https://github.com/crowdAI/crowdai-musical-genre-recognition-starter-kit
Framework	none

To prune, or not to prune: exploring the efficacy of pruning for model compression


Title	To prune, or not to prune: exploring the efficacy of pruning for model compression
Authors	Michael Zhu, Suyog Gupta
Abstract	Model pruning seeks to induce sparsity in a deep neural network’s various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model’s dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.
Tasks	Model Compression
Published	2017-10-05
URL	http://arxiv.org/abs/1710.01878v2
PDF	http://arxiv.org/pdf/1710.01878v2.pdf
PWC	https://paperswithcode.com/paper/to-prune-or-not-to-prune-exploring-the
Repo	https://github.com/dorlivne/simple_net_pruning
Framework	tf

DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier


Title	DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier
Authors	Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf
Abstract	A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40,000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, with significant improvement for predicting cellular locations.
Tasks
Published	2017-05-15
URL	http://arxiv.org/abs/1705.05919v1
PDF	http://arxiv.org/pdf/1705.05919v1.pdf
PWC	https://paperswithcode.com/paper/deepgo-predicting-protein-functions-from
Repo	https://github.com/bio-ontology-research-group/deepgo
Framework	none

Massively Multilingual Neural Grapheme-to-Phoneme Conversion


Title	Massively Multilingual Neural Grapheme-to-Phoneme Conversion
Authors	Ben Peters, Jon Dehdari, Josef van Genabith
Abstract	Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach to g2p which is trained on spelling–pronunciation pairs in hundreds of languages. The system shares a single encoder and decoder across all languages, allowing it to utilize the intrinsic similarities between different writing systems. We show an 11% improvement in phoneme error rate over an approach based on adapting high-resource monolingual g2p models to low-resource languages. Our model is also much more compact relative to previous approaches.
Tasks	Speech Recognition
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01464v1
PDF	http://arxiv.org/pdf/1708.01464v1.pdf
PWC	https://paperswithcode.com/paper/massively-multilingual-neural-grapheme-to
Repo	https://github.com/bpopeters/mg2p
Framework	none