July 29, 2019

2616 words 13 mins read

Paper Group AWR 132

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess. Fast Information-theoretic Bayesian Optimisation. Massive Exploration of Neural Machine Translation Architectures. Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identific …

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine


Title	SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Authors	Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho
Abstract	We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a question-answer pair, but start from an existing question-answer pair, crawled from J! Archive, and augment it with text snippets retrieved by Google. Following this approach, we built SearchQA, which consists of more than 140k question-answer pairs with each pair having 49.6 snippets on average. Each question-answer-context tuple of the SearchQA comes with additional meta-data such as the snippet’s URL, which we believe will be valuable resources for future research. We conduct human evaluation as well as test two baseline methods, one simple word selection and the other deep learning based, on the SearchQA. We show that there is a meaningful gap between the human and machine performances. This suggests that the proposed dataset could well serve as a benchmark for question-answering.
Tasks	Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05179v3
PDF	http://arxiv.org/pdf/1704.05179v3.pdf
PWC	https://paperswithcode.com/paper/searchqa-a-new-qa-dataset-augmented-with
Repo	https://github.com/google/active-qa
Framework	tf

DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess


Title	DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess
Authors	Eli David, Nathan S. Netanyahu, Lior Wolf
Abstract	We present an end-to-end learning method for chess, relying on deep neural networks. Without any a priori knowledge, in particular without any knowledge regarding the rules of chess, a deep neural network is trained using a combination of unsupervised pretraining and supervised training. The unsupervised training extracts high level features from a given position, and the supervised training learns to compare two chess positions and select the more favorable one. The training relies entirely on datasets of several million chess games, and no further domain specific knowledge is incorporated. The experiments show that the resulting neural network (referred to as DeepChess) is on a par with state-of-the-art chess playing programs, which have been developed through many years of manual feature selection and tuning. DeepChess is the first end-to-end machine learning-based method that results in a grandmaster-level chess playing performance.
Tasks	Feature Selection, Game of Chess
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09667v1
PDF	http://arxiv.org/pdf/1711.09667v1.pdf
PWC	https://paperswithcode.com/paper/deepchess-end-to-end-deep-neural-network-for
Repo	https://github.com/dangeng/DeepChess
Framework	pytorch

Fast Information-theoretic Bayesian Optimisation


Title	Fast Information-theoretic Bayesian Optimisation
Authors	Binxin Ru, Mark McLeod, Diego Granziol, Michael A. Osborne
Abstract	Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective. We develop a fast information-theoretic Bayesian Optimisation method, FITBO, that avoids the need for sampling the global minimiser, thus significantly reducing computational overhead. Moreover, in comparison with existing approaches, our method faces fewer constraints on kernel choice and enjoys the merits of dealing with the output space. We demonstrate empirically that FITBO inherits the performance associated with information-theoretic Bayesian optimisation, while being even faster than simpler Bayesian optimisation approaches, such as Expected Improvement.
Tasks	Bayesian Optimisation
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00673v5
PDF	http://arxiv.org/pdf/1711.00673v5.pdf
PWC	https://paperswithcode.com/paper/fast-information-theoretic-bayesian
Repo	https://github.com/rubinxin/FITBO
Framework	none

Massive Exploration of Neural Machine Translation Architectures


Title	Massive Exploration of Neural Machine Translation Architectures
Authors	Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le
Abstract	Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyperparameter search, as is commonly done with other neural network architectures, prohibitively expensive. In this work, we present the first large-scale analysis of NMT architecture hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on the standard WMT English to German translation task. Our experiments lead to novel insights and practical advice for building and extending NMT architectures. As part of this contribution, we release an open-source NMT framework that enables researchers to easily experiment with novel techniques and reproduce state of the art results.
Tasks	Machine Translation
Published	2017-03-11
URL	http://arxiv.org/abs/1703.03906v2
PDF	http://arxiv.org/pdf/1703.03906v2.pdf
PWC	https://paperswithcode.com/paper/massive-exploration-of-neural-machine
Repo	https://github.com/simonjisu/NMT
Framework	pytorch

Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification


Title	Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification
Authors	Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou
Abstract	Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction. In this work, we present a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching items can directly influence the computation of each other’s representation. Specifically, the spatial pooling layer is able to select regions from each frame, while the attention temporal pooling performed can select informative frames over the sequence, both pooling guided by the information from distance matching. Experiments are conduced on the iLIDS-VID, PRID-2011 and MARS datasets and the results demonstrate that this approach outperforms existing state-of-art methods. We also analyze how the joint pooling in both dimensions can boost the person re-id performance more effectively than using either of them separately.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2017-08-03
URL	http://arxiv.org/abs/1708.02286v2
PDF	http://arxiv.org/pdf/1708.02286v2.pdf
PWC	https://paperswithcode.com/paper/jointly-attentive-spatial-temporal-pooling
Repo	https://github.com/shuangjiexu/Spatial-Temporal-Pooling-Networks-ReID
Framework	torch

struc2vec: Learning Node Representations from Structural Identity


Title	struc2vec: Learning Node Representations from Structural Identity
Authors	Leonardo F. R. Ribeiro, Pedro H. P. Saverese, Daniel R. Figueiredo
Abstract	Implementation and experiments of graph embedding algorithms.deep walk,LINE(Large-scale Information Network Embedding),node2vec,SDNE(Structural Deep Network Embedding),struc2vec
Tasks	Graph Embedding, Network Embedding, Node Classification
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03165v3
PDF	http://arxiv.org/pdf/1704.03165v3.pdf
PWC	https://paperswithcode.com/paper/struc2vec-learning-node-representations-from
Repo	https://github.com/leoribeiro/struc2vec
Framework	none

The Role of Conversation Context for Sarcasm Detection in Online Interactions


Title	The Role of Conversation Context for Sarcasm Detection in Online Interactions
Authors	Debanjan Ghosh, Alexander Richard Fabbri, Smaranda Muresan
Abstract	Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker’s sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.
Tasks	Sarcasm Detection
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06226v1
PDF	http://arxiv.org/pdf/1707.06226v1.pdf
PWC	https://paperswithcode.com/paper/the-role-of-conversation-context-for-sarcasm
Repo	https://github.com/Alex-Fabbri/deep_learning_nlp_sarcasm
Framework	none

Graph Classification with 2D Convolutional Neural Networks


Title	Graph Classification with 2D Convolutional Neural Networks
Authors	Antoine Jean-Pierre Tixier, Giannis Nikolentzos, Polykarpos Meladianos, Michalis Vazirgiannis
Abstract	Graph learning is currently dominated by graph kernels, which, while powerful, suffer some significant limitations. Convolutional Neural Networks (CNNs) offer a very appealing alternative, but processing graphs with CNNs is not trivial. To address this challenge, many sophisticated extensions of CNNs have recently been introduced. In this paper, we reverse the problem: rather than proposing yet another graph CNN model, we introduce a novel way to represent graphs as multi-channel image-like structures that allows them to be handled by vanilla 2D CNNs. Experiments reveal that our method is more accurate than state-of-the-art graph kernels and graph CNNs on 4 out of 6 real-world datasets (with and without continuous node attributes), and close elsewhere. Our approach is also preferable to graph kernels in terms of time complexity. Code and data are publicly available.
Tasks	Graph Classification
Published	2017-07-29
URL	https://arxiv.org/abs/1708.02218v4
PDF	https://arxiv.org/pdf/1708.02218v4.pdf
PWC	https://paperswithcode.com/paper/graph-classification-with-2d-convolutional
Repo	https://github.com/Tixierae/graph_2D_CNN
Framework	tf

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels


Title	MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Authors	Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Abstract	Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on WebVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet
Tasks
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05055v2
PDF	http://arxiv.org/pdf/1712.05055v2.pdf
PWC	https://paperswithcode.com/paper/mentornet-learning-data-driven-curriculum-for
Repo	https://github.com/google/mentornet
Framework	tf

Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC


Title	Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC
Authors	Wahid Bhimji, Steven Andrew Farrell, Thorsten Kurth, Michela Paganini, Prabhat, Evan Racah
Abstract	There has been considerable recent activity applying deep convolutional neural nets (CNNs) to data from particle physics experiments. Current approaches on ATLAS/CMS have largely focussed on a subset of the calorimeter, and for identifying objects or particular particle types. We explore approaches that use the entire calorimeter, combined with track information, for directly conducting physics analyses: i.e. classifying events as known-physics background or new-physics signals. We use an existing RPV-Supersymmetry analysis as a case study and explore CNNs on multi-channel, high-resolution sparse images: applied on GPU and multi-node CPU architectures (including Knights Landing (KNL) Xeon Phi nodes) on the Cori supercomputer at NERSC.
Tasks
Published	2017-11-09
URL	http://arxiv.org/abs/1711.03573v2
PDF	http://arxiv.org/pdf/1711.03573v2.pdf
PWC	https://paperswithcode.com/paper/deep-neural-networks-for-physics-analysis-on
Repo	https://github.com/vmos1/atlas_cnn
Framework	tf

Enhanced Deep Residual Networks for Single Image Super-Resolution


Title	Enhanced Deep Residual Networks for Single Image Super-Resolution
Authors	Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee
Abstract	Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge.
Tasks	Image Super-Resolution, Super-Resolution
Published	2017-07-10
URL	http://arxiv.org/abs/1707.02921v1
PDF	http://arxiv.org/pdf/1707.02921v1.pdf
PWC	https://paperswithcode.com/paper/enhanced-deep-residual-networks-for-single
Repo	https://github.com/SimoneDutto/EDSR
Framework	pytorch

Natural Language Inference over Interaction Space


Title	Natural Language Inference over Interaction Space
Authors	Yichen Gong, Heng Luo, Jian Zhang
Abstract	Natural Language Inference (NLI) task requires an agent to determine the logical relationship between a natural language premise and a natural language hypothesis. We introduce Interactive Inference Network (IIN), a novel class of neural network architectures that is able to achieve high-level understanding of the sentence pair by hierarchically extracting semantic features from interaction space. We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It’s noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (MultiNLI) dataset with respect to the strongest published system.
Tasks	Natural Language Inference, Paraphrase Identification
Published	2017-09-13
URL	http://arxiv.org/abs/1709.04348v2
PDF	http://arxiv.org/pdf/1709.04348v2.pdf
PWC	https://paperswithcode.com/paper/natural-language-inference-over-interaction-1
Repo	https://github.com/YerevaNN/DIIN-in-Keras
Framework	none

Learning Credible Models


Title	Learning Credible Models
Authors	Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, Jenna Wiens
Abstract	In many settings, it is important that a model be capable of providing reasons for its predictions (i.e., the model must be interpretable). However, the model’s reasoning may not conform with well-established knowledge. In such cases, while interpretable, the model lacks \textit{credibility}. In this work, we formally define credibility in the linear setting and focus on techniques for learning models that are both accurate and credible. In particular, we propose a regularization penalty, expert yielded estimates (EYE), that incorporates expert knowledge about well-known relationships among covariates and the outcome of interest. We give both theoretical and empirical results comparing our proposed method to several other regularization techniques. Across a range of settings, experiments on both synthetic and real data show that models learned using the EYE penalty are significantly more credible than those learned using other penalties. Applied to a large-scale patient risk stratification task, our proposed technique results in a model whose top features overlap significantly with known clinical risk factors, while still achieving good predictive performance.
Tasks
Published	2017-11-08
URL	http://arxiv.org/abs/1711.03190v3
PDF	http://arxiv.org/pdf/1711.03190v3.pdf
PWC	https://paperswithcode.com/paper/learning-credible-models
Repo	https://github.com/nathanwang000/credible_learning
Framework	pytorch

A Tidy Data Model for Natural Language Processing using cleanNLP


Title	A Tidy Data Model for Natural Language Processing using cleanNLP
Authors	Taylor Arnold
Abstract	The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford’s CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction.
Tasks	Coreference Resolution, Dependency Parsing, Entity Linking, Named Entity Recognition, Part-Of-Speech Tagging, Sentiment Analysis, Tokenization
Published	2017-03-27
URL	http://arxiv.org/abs/1703.09570v2
PDF	http://arxiv.org/pdf/1703.09570v2.pdf
PWC	https://paperswithcode.com/paper/a-tidy-data-model-for-natural-language
Repo	https://github.com/statsmaths/cleanNLP
Framework	none

MojiTalk: Generating Emotional Responses at Scale


Title	MojiTalk: Generating Emotional Responses at Scale
Authors	Xianda Zhou, William Yang Wang
Abstract	Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. More specifically, we collect a large corpus of Twitter conversations that include emojis in the response, and assume the emojis convey the underlying emotions of the sentence. We then introduce a reinforced conditional variational encoder approach to train a deep generative model on these conversations, which allows us to use emojis to control the emotion of the generated text. Experimentally, we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions.
Tasks
Published	2017-11-11
URL	http://arxiv.org/abs/1711.04090v2
PDF	http://arxiv.org/pdf/1711.04090v2.pdf
PWC	https://paperswithcode.com/paper/mojitalk-generating-emotional-responses-at
Repo	https://github.com/Claude-Zhou/MojiTalk
Framework	tf