July 29, 2019

2616 words 13 mins read

Paper Group AWR 132

Paper Group AWR 132

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess. Fast Information-theoretic Bayesian Optimisation. Massive Exploration of Neural Machine Translation Architectures. Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identific …

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

Title SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Authors Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho
Abstract We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a question-answer pair, but start from an existing question-answer pair, crawled from J! Archive, and augment it with text snippets retrieved by Google. Following this approach, we built SearchQA, which consists of more than 140k question-answer pairs with each pair having 49.6 snippets on average. Each question-answer-context tuple of the SearchQA comes with additional meta-data such as the snippet’s URL, which we believe will be valuable resources for future research. We conduct human evaluation as well as test two baseline methods, one simple word selection and the other deep learning based, on the SearchQA. We show that there is a meaningful gap between the human and machine performances. This suggests that the proposed dataset could well serve as a benchmark for question-answering.
Tasks Open-Domain Question Answering, Question Answering, Reading Comprehension
Published 2017-04-18
URL http://arxiv.org/abs/1704.05179v3
PDF http://arxiv.org/pdf/1704.05179v3.pdf
PWC https://paperswithcode.com/paper/searchqa-a-new-qa-dataset-augmented-with
Repo https://github.com/google/active-qa
Framework tf

DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess

Title DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess
Authors Eli David, Nathan S. Netanyahu, Lior Wolf
Abstract We present an end-to-end learning method for chess, relying on deep neural networks. Without any a priori knowledge, in particular without any knowledge regarding the rules of chess, a deep neural network is trained using a combination of unsupervised pretraining and supervised training. The unsupervised training extracts high level features from a given position, and the supervised training learns to compare two chess positions and select the more favorable one. The training relies entirely on datasets of several million chess games, and no further domain specific knowledge is incorporated. The experiments show that the resulting neural network (referred to as DeepChess) is on a par with state-of-the-art chess playing programs, which have been developed through many years of manual feature selection and tuning. DeepChess is the first end-to-end machine learning-based method that results in a grandmaster-level chess playing performance.
Tasks Feature Selection, Game of Chess
Published 2017-11-27
URL http://arxiv.org/abs/1711.09667v1
PDF http://arxiv.org/pdf/1711.09667v1.pdf
PWC https://paperswithcode.com/paper/deepchess-end-to-end-deep-neural-network-for
Repo https://github.com/dangeng/DeepChess
Framework pytorch

Fast Information-theoretic Bayesian Optimisation

Title Fast Information-theoretic Bayesian Optimisation
Authors Binxin Ru, Mark McLeod, Diego Granziol, Michael A. Osborne
Abstract Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective. We develop a fast information-theoretic Bayesian Optimisation method, FITBO, that avoids the need for sampling the global minimiser, thus significantly reducing computational overhead. Moreover, in comparison with existing approaches, our method faces fewer constraints on kernel choice and enjoys the merits of dealing with the output space. We demonstrate empirically that FITBO inherits the performance associated with information-theoretic Bayesian optimisation, while being even faster than simpler Bayesian optimisation approaches, such as Expected Improvement.
Tasks Bayesian Optimisation
Published 2017-11-02
URL http://arxiv.org/abs/1711.00673v5
PDF http://arxiv.org/pdf/1711.00673v5.pdf
PWC https://paperswithcode.com/paper/fast-information-theoretic-bayesian
Repo https://github.com/rubinxin/FITBO
Framework none

Massive Exploration of Neural Machine Translation Architectures

Title Massive Exploration of Neural Machine Translation Architectures
Authors Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le
Abstract Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyperparameter search, as is commonly done with other neural network architectures, prohibitively expensive. In this work, we present the first large-scale analysis of NMT architecture hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on the standard WMT English to German translation task. Our experiments lead to novel insights and practical advice for building and extending NMT architectures. As part of this contribution, we release an open-source NMT framework that enables researchers to easily experiment with novel techniques and reproduce state of the art results.
Tasks Machine Translation
Published 2017-03-11
URL http://arxiv.org/abs/1703.03906v2
PDF http://arxiv.org/pdf/1703.03906v2.pdf
PWC https://paperswithcode.com/paper/massive-exploration-of-neural-machine
Repo https://github.com/simonjisu/NMT
Framework pytorch

Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification

Title Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification
Authors Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou
Abstract Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction. In this work, we present a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching items can directly influence the computation of each other’s representation. Specifically, the spatial pooling layer is able to select regions from each frame, while the attention temporal pooling performed can select informative frames over the sequence, both pooling guided by the information from distance matching. Experiments are conduced on the iLIDS-VID, PRID-2011 and MARS datasets and the results demonstrate that this approach outperforms existing state-of-art methods. We also analyze how the joint pooling in both dimensions can boost the person re-id performance more effectively than using either of them separately.
Tasks Person Re-Identification, Video-Based Person Re-Identification
Published 2017-08-03
URL http://arxiv.org/abs/1708.02286v2
PDF http://arxiv.org/pdf/1708.02286v2.pdf
PWC https://paperswithcode.com/paper/jointly-attentive-spatial-temporal-pooling
Repo https://github.com/shuangjiexu/Spatial-Temporal-Pooling-Networks-ReID
Framework torch

struc2vec: Learning Node Representations from Structural Identity

Title struc2vec: Learning Node Representations from Structural Identity
Authors Leonardo F. R. Ribeiro, Pedro H. P. Saverese, Daniel R. Figueiredo
Abstract Implementation and experiments of graph embedding algorithms.deep walk,LINE(Large-scale Information Network Embedding),node2vec,SDNE(Structural Deep Network Embedding),struc2vec
Tasks Graph Embedding, Network Embedding, Node Classification
Published 2017-04-11
URL http://arxiv.org/abs/1704.03165v3
PDF http://arxiv.org/pdf/1704.03165v3.pdf
PWC https://paperswithcode.com/paper/struc2vec-learning-node-representations-from
Repo https://github.com/leoribeiro/struc2vec
Framework none

The Role of Conversation Context for Sarcasm Detection in Online Interactions

Title The Role of Conversation Context for Sarcasm Detection in Online Interactions
Authors Debanjan Ghosh, Alexander Richard Fabbri, Smaranda Muresan
Abstract Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker’s sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.
Tasks Sarcasm Detection
Published 2017-07-19
URL http://arxiv.org/abs/1707.06226v1
PDF http://arxiv.org/pdf/1707.06226v1.pdf
PWC https://paperswithcode.com/paper/the-role-of-conversation-context-for-sarcasm
Repo https://github.com/Alex-Fabbri/deep_learning_nlp_sarcasm
Framework none

Graph Classification with 2D Convolutional Neural Networks

Title Graph Classification with 2D Convolutional Neural Networks
Authors Antoine Jean-Pierre Tixier, Giannis Nikolentzos, Polykarpos Meladianos, Michalis Vazirgiannis
Abstract Graph learning is currently dominated by graph kernels, which, while powerful, suffer some significant limitations. Convolutional Neural Networks (CNNs) offer a very appealing alternative, but processing graphs with CNNs is not trivial. To address this challenge, many sophisticated extensions of CNNs have recently been introduced. In this paper, we reverse the problem: rather than proposing yet another graph CNN model, we introduce a novel way to represent graphs as multi-channel image-like structures that allows them to be handled by vanilla 2D CNNs. Experiments reveal that our method is more accurate than state-of-the-art graph kernels and graph CNNs on 4 out of 6 real-world datasets (with and without continuous node attributes), and close elsewhere. Our approach is also preferable to graph kernels in terms of time complexity. Code and data are publicly available.
Tasks Graph Classification
Published 2017-07-29
URL https://arxiv.org/abs/1708.02218v4
PDF https://arxiv.org/pdf/1708.02218v4.pdf
PWC https://paperswithcode.com/paper/graph-classification-with-2d-convolutional
Repo https://github.com/Tixierae/graph_2D_CNN
Framework tf

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Title MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Authors Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Abstract Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on WebVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet
Tasks
Published 2017-12-14
URL http://arxiv.org/abs/1712.05055v2
PDF http://arxiv.org/pdf/1712.05055v2.pdf
PWC https://paperswithcode.com/paper/mentornet-learning-data-driven-curriculum-for
Repo https://github.com/google/mentornet
Framework tf

Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC

Title Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC
Authors Wahid Bhimji, Steven Andrew Farrell, Thorsten Kurth, Michela Paganini, Prabhat, Evan Racah
Abstract There has been considerable recent activity applying deep convolutional neural nets (CNNs) to data from particle physics experiments. Current approaches on ATLAS/CMS have largely focussed on a subset of the calorimeter, and for identifying objects or particular particle types. We explore approaches that use the entire calorimeter, combined with track information, for directly conducting physics analyses: i.e. classifying events as known-physics background or new-physics signals. We use an existing RPV-Supersymmetry analysis as a case study and explore CNNs on multi-channel, high-resolution sparse images: applied on GPU and multi-node CPU architectures (including Knights Landing (KNL) Xeon Phi nodes) on the Cori supercomputer at NERSC.
Tasks
Published 2017-11-09
URL http://arxiv.org/abs/1711.03573v2
PDF http://arxiv.org/pdf/1711.03573v2.pdf
PWC https://paperswithcode.com/paper/deep-neural-networks-for-physics-analysis-on
Repo https://github.com/vmos1/atlas_cnn
Framework tf

Enhanced Deep Residual Networks for Single Image Super-Resolution

Title Enhanced Deep Residual Networks for Single Image Super-Resolution
Authors Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee
Abstract Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge.
Tasks Image Super-Resolution, Super-Resolution
Published 2017-07-10
URL http://arxiv.org/abs/1707.02921v1
PDF http://arxiv.org/pdf/1707.02921v1.pdf
PWC https://paperswithcode.com/paper/enhanced-deep-residual-networks-for-single
Repo https://github.com/SimoneDutto/EDSR
Framework pytorch

Natural Language Inference over Interaction Space

Title Natural Language Inference over Interaction Space
Authors Yichen Gong, Heng Luo, Jian Zhang
Abstract Natural Language Inference (NLI) task requires an agent to determine the logical relationship between a natural language premise and a natural language hypothesis. We introduce Interactive Inference Network (IIN), a novel class of neural network architectures that is able to achieve high-level understanding of the sentence pair by hierarchically extracting semantic features from interaction space. We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It’s noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (MultiNLI) dataset with respect to the strongest published system.
Tasks Natural Language Inference, Paraphrase Identification
Published 2017-09-13
URL http://arxiv.org/abs/1709.04348v2
PDF http://arxiv.org/pdf/1709.04348v2.pdf
PWC https://paperswithcode.com/paper/natural-language-inference-over-interaction-1
Repo https://github.com/YerevaNN/DIIN-in-Keras
Framework none

Learning Credible Models

Title Learning Credible Models
Authors Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, Jenna Wiens
Abstract In many settings, it is important that a model be capable of providing reasons for its predictions (i.e., the model must be interpretable). However, the model’s reasoning may not conform with well-established knowledge. In such cases, while interpretable, the model lacks \textit{credibility}. In this work, we formally define credibility in the linear setting and focus on techniques for learning models that are both accurate and credible. In particular, we propose a regularization penalty, expert yielded estimates (EYE), that incorporates expert knowledge about well-known relationships among covariates and the outcome of interest. We give both theoretical and empirical results comparing our proposed method to several other regularization techniques. Across a range of settings, experiments on both synthetic and real data show that models learned using the EYE penalty are significantly more credible than those learned using other penalties. Applied to a large-scale patient risk stratification task, our proposed technique results in a model whose top features overlap significantly with known clinical risk factors, while still achieving good predictive performance.
Tasks
Published 2017-11-08
URL http://arxiv.org/abs/1711.03190v3
PDF http://arxiv.org/pdf/1711.03190v3.pdf
PWC https://paperswithcode.com/paper/learning-credible-models
Repo https://github.com/nathanwang000/credible_learning
Framework pytorch

A Tidy Data Model for Natural Language Processing using cleanNLP

Title A Tidy Data Model for Natural Language Processing using cleanNLP
Authors Taylor Arnold
Abstract The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford’s CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction.
Tasks Coreference Resolution, Dependency Parsing, Entity Linking, Named Entity Recognition, Part-Of-Speech Tagging, Sentiment Analysis, Tokenization
Published 2017-03-27
URL http://arxiv.org/abs/1703.09570v2
PDF http://arxiv.org/pdf/1703.09570v2.pdf
PWC https://paperswithcode.com/paper/a-tidy-data-model-for-natural-language
Repo https://github.com/statsmaths/cleanNLP
Framework none

MojiTalk: Generating Emotional Responses at Scale

Title MojiTalk: Generating Emotional Responses at Scale
Authors Xianda Zhou, William Yang Wang
Abstract Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. More specifically, we collect a large corpus of Twitter conversations that include emojis in the response, and assume the emojis convey the underlying emotions of the sentence. We then introduce a reinforced conditional variational encoder approach to train a deep generative model on these conversations, which allows us to use emojis to control the emotion of the generated text. Experimentally, we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions.
Tasks
Published 2017-11-11
URL http://arxiv.org/abs/1711.04090v2
PDF http://arxiv.org/pdf/1711.04090v2.pdf
PWC https://paperswithcode.com/paper/mojitalk-generating-emotional-responses-at
Repo https://github.com/Claude-Zhou/MojiTalk
Framework tf
comments powered by Disqus