Paper Group AWR 49
Bidirectional Attention for SQL Generation. Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net. A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing. Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection. Contextualized Word Representations for …
Bidirectional Attention for SQL Generation
Title | Bidirectional Attention for SQL Generation |
Authors | Tong Guo, Huilin Gao |
Abstract | Generating structural query language (SQL) queries from natural language is a long-standing open problem. Answering a natural language question about a database table requires modeling complex interactions between the columns of the table and the question. In this paper, we apply the synthesizing approach to solve this problem. Based on the structure of SQL queries, we break down the model to three sub-modules and design specific deep neural networks for each of them. Taking inspiration from the similar machine reading task, we employ the bidirectional attention mechanisms and character-level embedding with convolutional neural networks (CNNs) to improve the result. Experimental evaluations show that our model achieves the state-of-the-art results in WikiSQL dataset. |
Tasks | Reading Comprehension |
Published | 2017-12-30 |
URL | http://arxiv.org/abs/1801.00076v6 |
http://arxiv.org/pdf/1801.00076v6.pdf | |
PWC | https://paperswithcode.com/paper/bidirectional-attention-for-sql-generation |
Repo | https://github.com/guotong1988/NL2SQL |
Framework | pytorch |
Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net
Title | Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net |
Authors | Tom Michoel |
Abstract | The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there has been limited success in deriving estimates for the full posterior distribution of regression coefficients in these models, due to a need to evaluate analytically intractable partition function integrals. Here, the Fourier transform is used to express these integrals as complex-valued oscillatory integrals over “regression frequencies”. This results in an analytic expansion and stationary phase approximation for the partition functions of the Bayesian lasso and elastic net, where the non-differentiability of the double-exponential prior has so far eluded such an approach. Use of this approximation leads to highly accurate numerical estimates for the expectation values and marginal posterior distributions of the regression coefficients, and allows for Bayesian inference of much higher dimensional models than previously possible. |
Tasks | Bayesian Inference |
Published | 2017-09-25 |
URL | http://arxiv.org/abs/1709.08535v3 |
http://arxiv.org/pdf/1709.08535v3.pdf | |
PWC | https://paperswithcode.com/paper/analytic-solution-and-stationary-phase |
Repo | https://github.com/tmichoel/bayonet |
Framework | none |
A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing
Title | A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing |
Authors | Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, David Wipf |
Abstract | This paper proposes a deep neural network structure that exploits edge information in addressing representative low-level vision tasks such as layer separation and image filtering. Unlike most other deep learning strategies applied in this context, our approach tackles these challenging problems by estimating edges and reconstructing images using only cascaded convolutional layers arranged such that no handcrafted or application-specific image-processing components are required. We apply the resulting transferrable pipeline to two different problem domains that are both sensitive to edges, namely, single image reflection removal and image smoothing. For the former, using a mild reflection smoothness assumption and a novel synthetic data generation method that acts as a type of weak supervision, our network is able to solve much more difficult reflection cases that cannot be handled by previous methods. For the latter, we also exceed the state-of-the-art quantitative and qualitative results by wide margins. In all cases, the proposed framework is simple, fast, and easy to transfer across disparate domains. |
Tasks | Synthetic Data Generation |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.03474v2 |
http://arxiv.org/pdf/1708.03474v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generic-deep-architecture-for-single-image |
Repo | https://github.com/fqnchina/CEILNet |
Framework | torch |
Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection
Title | Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection |
Authors | Ludovic Trottier, Philippe Giguère, Brahim Chaib-draa |
Abstract | Convolutional neural networks (CNNs) have become the most successful approach in many vision-related domains. However, they are limited to domains where data is abundant. Recent works have looked at multi-task learning (MTL) to mitigate data scarcity by leveraging domain-specific information from related tasks. In this paper, we present a novel soft-parameter sharing mechanism for CNNs in a MTL setting, which we refer to as Deep Collaboration. We propose taking into account the notion that task relevance depends on depth by using lateral transformation blocs with skip connections. This allows extracting task-specific features at various depth without sacrificing features relevant to all tasks. We show that CNNs connected with our Deep Collaboration obtain better accuracy on facial landmark detection with related tasks. We finally verify that our approach effectively allows knowledge sharing by showing depth-specific influence of tasks that we know are related. |
Tasks | Facial Landmark Detection, Multi-Task Learning |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1711.00111v2 |
http://arxiv.org/pdf/1711.00111v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-by-deep-collaboration-and |
Repo | https://github.com/ltrottier/deep-collaboration-network |
Framework | pytorch |
Contextualized Word Representations for Reading Comprehension
Title | Contextualized Word Representations for Reading Comprehension |
Authors | Shimi Salant, Jonathan Berant |
Abstract | Reading a document and extracting an answer to a question about its content has attracted substantial attention recently. While most work has focused on the interaction between the question and the document, in this work we evaluate the importance of context when the question and document are processed independently. We take a standard neural architecture for this task, and show that by providing rich contextualized word representations from a large pre-trained language model as well as allowing the model to choose between context-dependent and context-independent word representations, we can obtain dramatic improvements and reach performance comparable to state-of-the-art on the competitive SQuAD dataset. |
Tasks | Language Modelling, Question Answering, Reading Comprehension |
Published | 2017-12-10 |
URL | http://arxiv.org/abs/1712.03609v4 |
http://arxiv.org/pdf/1712.03609v4.pdf | |
PWC | https://paperswithcode.com/paper/contextualized-word-representations-for |
Repo | https://github.com/shimisalant/CWR |
Framework | tf |
A Mixture of Matrix Variate Bilinear Factor Analyzers
Title | A Mixture of Matrix Variate Bilinear Factor Analyzers |
Authors | Michael P. B. Gallaugher, Paul D. McNicholas |
Abstract | Over the years data has become increasingly higher dimensional, which has prompted an increased need for dimension reduction techniques. This is perhaps especially true for clustering (unsupervised classification) as well as semi-supervised and supervised classification. Although dimension reduction in the area of clustering for multivariate data has been quite thoroughly discussed within the literature, there is relatively little work in the area of three-way, or matrix variate, data. Herein, we develop a mixture of matrix variate bilinear factor analyzers (MMVBFA) model for use in clustering high-dimensional matrix variate data. This work can be considered both the first matrix variate bilinear factor analysis model as well as the first MMVBFA model. Parameter estimation is discussed, and the MMVBFA model is illustrated using simulated and real data. |
Tasks | Dimensionality Reduction |
Published | 2017-12-22 |
URL | http://arxiv.org/abs/1712.08664v3 |
http://arxiv.org/pdf/1712.08664v3.pdf | |
PWC | https://paperswithcode.com/paper/a-mixture-of-matrix-variate-bilinear-factor |
Repo | https://github.com/nikpocuca/MatrixVariate.jl |
Framework | none |
Simple and Effective Multi-Paragraph Reading Comprehension
Title | Simple and Effective Multi-Paragraph Reading Comprehension |
Authors | Christopher Clark, Matt Gardner |
Abstract | We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system. |
Tasks | Question Answering, Reading Comprehension |
Published | 2017-10-29 |
URL | http://arxiv.org/abs/1710.10723v2 |
http://arxiv.org/pdf/1710.10723v2.pdf | |
PWC | https://paperswithcode.com/paper/simple-and-effective-multi-paragraph-reading |
Repo | https://github.com/allenai/document-qa |
Framework | tf |
Learned in Translation: Contextualized Word Vectors
Title | Learned in Translation: Contextualized Word Vectors |
Authors | Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher |
Abstract | Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art. |
Tasks | Machine Translation, Question Answering, Sentiment Analysis, Text Classification |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00107v2 |
http://arxiv.org/pdf/1708.00107v2.pdf | |
PWC | https://paperswithcode.com/paper/learned-in-translation-contextualized-word |
Repo | https://github.com/menajosep/AleatoricSent |
Framework | tf |
Watset: Automatic Induction of Synsets from a Graph of Synonyms
Title | Watset: Automatic Induction of Synsets from a Graph of Synonyms |
Authors | Dmitry Ustalov, Alexander Panchenko, Chris Biemann |
Abstract | This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources. |
Tasks | Word Embeddings, Word Sense Induction |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07157v1 |
http://arxiv.org/pdf/1704.07157v1.pdf | |
PWC | https://paperswithcode.com/paper/watset-automatic-induction-of-synsets-from-a |
Repo | https://github.com/dustalov/watset |
Framework | none |
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
Title | PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications |
Authors | Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma |
Abstract | PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood. Here we discuss our implementation of PixelCNNs which we make available at https://github.com/openai/pixel-cnn. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance. 1) We use a discretized logistic mixture likelihood on the pixels, rather than a 256-way softmax, which we find to speed up training. 2) We condition on whole pixels, rather than R/G/B sub-pixels, simplifying the model structure. 3) We use downsampling to efficiently capture structure at multiple resolutions. 4) We introduce additional short-cut connections to further speed up optimization. 5) We regularize the model using dropout. Finally, we present state-of-the-art log likelihood results on CIFAR-10 to demonstrate the usefulness of these modifications. |
Tasks | Image Generation |
Published | 2017-01-19 |
URL | http://arxiv.org/abs/1701.05517v1 |
http://arxiv.org/pdf/1701.05517v1.pdf | |
PWC | https://paperswithcode.com/paper/pixelcnn-improving-the-pixelcnn-with |
Repo | https://github.com/openai/pixel-cnn |
Framework | tf |
Learning Deep CNN Denoiser Prior for Image Restoration
Title | Learning Deep CNN Denoiser Prior for Image Restoration |
Authors | Kai Zhang, Wangmeng Zuo, Shuhang Gu, Lei Zhang |
Abstract | Model-based optimization methods and discriminative learning methods have been the two dominant strategies for solving various inverse problems in low-level vision. Typically, those two kinds of methods have their respective merits and drawbacks, e.g., model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming with sophisticated priors for the purpose of good performance; in the meanwhile, discriminative learning methods have fast testing speed but their application range is greatly restricted by the specialized task. Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e.g., deblurring). Such an integration induces considerable advantage when the denoiser is obtained via discriminative learning. However, the study of integration with fast discriminative denoiser prior is still lacking. To this end, this paper aims to train a set of fast and effective CNN (convolutional neural network) denoisers and integrate them into model-based optimization method to solve other inverse problems. Experimental results demonstrate that the learned set of denoisers not only achieve promising Gaussian denoising results but also can be used as prior to deliver good performance for various low-level vision applications. |
Tasks | Deblurring, Denoising, Image Denoising, Image Restoration |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03264v1 |
http://arxiv.org/pdf/1704.03264v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-cnn-denoiser-prior-for-image |
Repo | https://github.com/cszn/ircnn |
Framework | none |
Dual Path Networks
Title | Dual Path Networks |
Authors | Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng |
Abstract | In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications. |
Tasks | Image Classification |
Published | 2017-07-06 |
URL | http://arxiv.org/abs/1707.01629v2 |
http://arxiv.org/pdf/1707.01629v2.pdf | |
PWC | https://paperswithcode.com/paper/dual-path-networks |
Repo | https://github.com/crowdAI/crowdai-musical-genre-recognition-starter-kit |
Framework | none |
To prune, or not to prune: exploring the efficacy of pruning for model compression
Title | To prune, or not to prune: exploring the efficacy of pruning for model compression |
Authors | Michael Zhu, Suyog Gupta |
Abstract | Model pruning seeks to induce sparsity in a deep neural network’s various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model’s dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy. |
Tasks | Model Compression |
Published | 2017-10-05 |
URL | http://arxiv.org/abs/1710.01878v2 |
http://arxiv.org/pdf/1710.01878v2.pdf | |
PWC | https://paperswithcode.com/paper/to-prune-or-not-to-prune-exploring-the |
Repo | https://github.com/dorlivne/simple_net_pruning |
Framework | tf |
DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier
Title | DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier |
Authors | Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf |
Abstract | A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40,000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, with significant improvement for predicting cellular locations. |
Tasks | |
Published | 2017-05-15 |
URL | http://arxiv.org/abs/1705.05919v1 |
http://arxiv.org/pdf/1705.05919v1.pdf | |
PWC | https://paperswithcode.com/paper/deepgo-predicting-protein-functions-from |
Repo | https://github.com/bio-ontology-research-group/deepgo |
Framework | none |
Massively Multilingual Neural Grapheme-to-Phoneme Conversion
Title | Massively Multilingual Neural Grapheme-to-Phoneme Conversion |
Authors | Ben Peters, Jon Dehdari, Josef van Genabith |
Abstract | Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach to g2p which is trained on spelling–pronunciation pairs in hundreds of languages. The system shares a single encoder and decoder across all languages, allowing it to utilize the intrinsic similarities between different writing systems. We show an 11% improvement in phoneme error rate over an approach based on adapting high-resource monolingual g2p models to low-resource languages. Our model is also much more compact relative to previous approaches. |
Tasks | Speech Recognition |
Published | 2017-08-04 |
URL | http://arxiv.org/abs/1708.01464v1 |
http://arxiv.org/pdf/1708.01464v1.pdf | |
PWC | https://paperswithcode.com/paper/massively-multilingual-neural-grapheme-to |
Repo | https://github.com/bpopeters/mg2p |
Framework | none |