Paper Group AWR 132
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess. Fast Information-theoretic Bayesian Optimisation. Massive Exploration of Neural Machine Translation Architectures. Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identific …
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine
Title | SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine |
Authors | Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho |
Abstract | We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a question-answer pair, but start from an existing question-answer pair, crawled from J! Archive, and augment it with text snippets retrieved by Google. Following this approach, we built SearchQA, which consists of more than 140k question-answer pairs with each pair having 49.6 snippets on average. Each question-answer-context tuple of the SearchQA comes with additional meta-data such as the snippet’s URL, which we believe will be valuable resources for future research. We conduct human evaluation as well as test two baseline methods, one simple word selection and the other deep learning based, on the SearchQA. We show that there is a meaningful gap between the human and machine performances. This suggests that the proposed dataset could well serve as a benchmark for question-answering. |
Tasks | Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05179v3 |
http://arxiv.org/pdf/1704.05179v3.pdf | |
PWC | https://paperswithcode.com/paper/searchqa-a-new-qa-dataset-augmented-with |
Repo | https://github.com/google/active-qa |
Framework | tf |
DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess
Title | DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess |
Authors | Eli David, Nathan S. Netanyahu, Lior Wolf |
Abstract | We present an end-to-end learning method for chess, relying on deep neural networks. Without any a priori knowledge, in particular without any knowledge regarding the rules of chess, a deep neural network is trained using a combination of unsupervised pretraining and supervised training. The unsupervised training extracts high level features from a given position, and the supervised training learns to compare two chess positions and select the more favorable one. The training relies entirely on datasets of several million chess games, and no further domain specific knowledge is incorporated. The experiments show that the resulting neural network (referred to as DeepChess) is on a par with state-of-the-art chess playing programs, which have been developed through many years of manual feature selection and tuning. DeepChess is the first end-to-end machine learning-based method that results in a grandmaster-level chess playing performance. |
Tasks | Feature Selection, Game of Chess |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09667v1 |
http://arxiv.org/pdf/1711.09667v1.pdf | |
PWC | https://paperswithcode.com/paper/deepchess-end-to-end-deep-neural-network-for |
Repo | https://github.com/dangeng/DeepChess |
Framework | pytorch |
Fast Information-theoretic Bayesian Optimisation
Title | Fast Information-theoretic Bayesian Optimisation |
Authors | Binxin Ru, Mark McLeod, Diego Granziol, Michael A. Osborne |
Abstract | Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective. We develop a fast information-theoretic Bayesian Optimisation method, FITBO, that avoids the need for sampling the global minimiser, thus significantly reducing computational overhead. Moreover, in comparison with existing approaches, our method faces fewer constraints on kernel choice and enjoys the merits of dealing with the output space. We demonstrate empirically that FITBO inherits the performance associated with information-theoretic Bayesian optimisation, while being even faster than simpler Bayesian optimisation approaches, such as Expected Improvement. |
Tasks | Bayesian Optimisation |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00673v5 |
http://arxiv.org/pdf/1711.00673v5.pdf | |
PWC | https://paperswithcode.com/paper/fast-information-theoretic-bayesian |
Repo | https://github.com/rubinxin/FITBO |
Framework | none |
Massive Exploration of Neural Machine Translation Architectures
Title | Massive Exploration of Neural Machine Translation Architectures |
Authors | Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le |
Abstract | Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to weeks of GPU time to converge. This makes exhaustive hyperparameter search, as is commonly done with other neural network architectures, prohibitively expensive. In this work, we present the first large-scale analysis of NMT architecture hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on the standard WMT English to German translation task. Our experiments lead to novel insights and practical advice for building and extending NMT architectures. As part of this contribution, we release an open-source NMT framework that enables researchers to easily experiment with novel techniques and reproduce state of the art results. |
Tasks | Machine Translation |
Published | 2017-03-11 |
URL | http://arxiv.org/abs/1703.03906v2 |
http://arxiv.org/pdf/1703.03906v2.pdf | |
PWC | https://paperswithcode.com/paper/massive-exploration-of-neural-machine |
Repo | https://github.com/simonjisu/NMT |
Framework | pytorch |
Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification
Title | Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification |
Authors | Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, Pan Zhou |
Abstract | Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction. In this work, we present a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching items can directly influence the computation of each other’s representation. Specifically, the spatial pooling layer is able to select regions from each frame, while the attention temporal pooling performed can select informative frames over the sequence, both pooling guided by the information from distance matching. Experiments are conduced on the iLIDS-VID, PRID-2011 and MARS datasets and the results demonstrate that this approach outperforms existing state-of-art methods. We also analyze how the joint pooling in both dimensions can boost the person re-id performance more effectively than using either of them separately. |
Tasks | Person Re-Identification, Video-Based Person Re-Identification |
Published | 2017-08-03 |
URL | http://arxiv.org/abs/1708.02286v2 |
http://arxiv.org/pdf/1708.02286v2.pdf | |
PWC | https://paperswithcode.com/paper/jointly-attentive-spatial-temporal-pooling |
Repo | https://github.com/shuangjiexu/Spatial-Temporal-Pooling-Networks-ReID |
Framework | torch |
struc2vec: Learning Node Representations from Structural Identity
Title | struc2vec: Learning Node Representations from Structural Identity |
Authors | Leonardo F. R. Ribeiro, Pedro H. P. Saverese, Daniel R. Figueiredo |
Abstract | Implementation and experiments of graph embedding algorithms.deep walk,LINE(Large-scale Information Network Embedding),node2vec,SDNE(Structural Deep Network Embedding),struc2vec |
Tasks | Graph Embedding, Network Embedding, Node Classification |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03165v3 |
http://arxiv.org/pdf/1704.03165v3.pdf | |
PWC | https://paperswithcode.com/paper/struc2vec-learning-node-representations-from |
Repo | https://github.com/leoribeiro/struc2vec |
Framework | none |
The Role of Conversation Context for Sarcasm Detection in Online Interactions
Title | The Role of Conversation Context for Sarcasm Detection in Online Interactions |
Authors | Debanjan Ghosh, Alexander Richard Fabbri, Smaranda Muresan |
Abstract | Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker’s sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task. |
Tasks | Sarcasm Detection |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06226v1 |
http://arxiv.org/pdf/1707.06226v1.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-conversation-context-for-sarcasm |
Repo | https://github.com/Alex-Fabbri/deep_learning_nlp_sarcasm |
Framework | none |
Graph Classification with 2D Convolutional Neural Networks
Title | Graph Classification with 2D Convolutional Neural Networks |
Authors | Antoine Jean-Pierre Tixier, Giannis Nikolentzos, Polykarpos Meladianos, Michalis Vazirgiannis |
Abstract | Graph learning is currently dominated by graph kernels, which, while powerful, suffer some significant limitations. Convolutional Neural Networks (CNNs) offer a very appealing alternative, but processing graphs with CNNs is not trivial. To address this challenge, many sophisticated extensions of CNNs have recently been introduced. In this paper, we reverse the problem: rather than proposing yet another graph CNN model, we introduce a novel way to represent graphs as multi-channel image-like structures that allows them to be handled by vanilla 2D CNNs. Experiments reveal that our method is more accurate than state-of-the-art graph kernels and graph CNNs on 4 out of 6 real-world datasets (with and without continuous node attributes), and close elsewhere. Our approach is also preferable to graph kernels in terms of time complexity. Code and data are publicly available. |
Tasks | Graph Classification |
Published | 2017-07-29 |
URL | https://arxiv.org/abs/1708.02218v4 |
https://arxiv.org/pdf/1708.02218v4.pdf | |
PWC | https://paperswithcode.com/paper/graph-classification-with-2d-convolutional |
Repo | https://github.com/Tixierae/graph_2D_CNN |
Framework | tf |
MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Title | MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels |
Authors | Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei |
Abstract | Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on WebVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet |
Tasks | |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05055v2 |
http://arxiv.org/pdf/1712.05055v2.pdf | |
PWC | https://paperswithcode.com/paper/mentornet-learning-data-driven-curriculum-for |
Repo | https://github.com/google/mentornet |
Framework | tf |
Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC
Title | Deep Neural Networks for Physics Analysis on low-level whole-detector data at the LHC |
Authors | Wahid Bhimji, Steven Andrew Farrell, Thorsten Kurth, Michela Paganini, Prabhat, Evan Racah |
Abstract | There has been considerable recent activity applying deep convolutional neural nets (CNNs) to data from particle physics experiments. Current approaches on ATLAS/CMS have largely focussed on a subset of the calorimeter, and for identifying objects or particular particle types. We explore approaches that use the entire calorimeter, combined with track information, for directly conducting physics analyses: i.e. classifying events as known-physics background or new-physics signals. We use an existing RPV-Supersymmetry analysis as a case study and explore CNNs on multi-channel, high-resolution sparse images: applied on GPU and multi-node CPU architectures (including Knights Landing (KNL) Xeon Phi nodes) on the Cori supercomputer at NERSC. |
Tasks | |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03573v2 |
http://arxiv.org/pdf/1711.03573v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-networks-for-physics-analysis-on |
Repo | https://github.com/vmos1/atlas_cnn |
Framework | tf |
Enhanced Deep Residual Networks for Single Image Super-Resolution
Title | Enhanced Deep Residual Networks for Single Image Super-Resolution |
Authors | Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee |
Abstract | Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02921v1 |
http://arxiv.org/pdf/1707.02921v1.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-deep-residual-networks-for-single |
Repo | https://github.com/SimoneDutto/EDSR |
Framework | pytorch |
Natural Language Inference over Interaction Space
Title | Natural Language Inference over Interaction Space |
Authors | Yichen Gong, Heng Luo, Jian Zhang |
Abstract | Natural Language Inference (NLI) task requires an agent to determine the logical relationship between a natural language premise and a natural language hypothesis. We introduce Interactive Inference Network (IIN), a novel class of neural network architectures that is able to achieve high-level understanding of the sentence pair by hierarchically extracting semantic features from interaction space. We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It’s noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (MultiNLI) dataset with respect to the strongest published system. |
Tasks | Natural Language Inference, Paraphrase Identification |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04348v2 |
http://arxiv.org/pdf/1709.04348v2.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-inference-over-interaction-1 |
Repo | https://github.com/YerevaNN/DIIN-in-Keras |
Framework | none |
Learning Credible Models
Title | Learning Credible Models |
Authors | Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, Jenna Wiens |
Abstract | In many settings, it is important that a model be capable of providing reasons for its predictions (i.e., the model must be interpretable). However, the model’s reasoning may not conform with well-established knowledge. In such cases, while interpretable, the model lacks \textit{credibility}. In this work, we formally define credibility in the linear setting and focus on techniques for learning models that are both accurate and credible. In particular, we propose a regularization penalty, expert yielded estimates (EYE), that incorporates expert knowledge about well-known relationships among covariates and the outcome of interest. We give both theoretical and empirical results comparing our proposed method to several other regularization techniques. Across a range of settings, experiments on both synthetic and real data show that models learned using the EYE penalty are significantly more credible than those learned using other penalties. Applied to a large-scale patient risk stratification task, our proposed technique results in a model whose top features overlap significantly with known clinical risk factors, while still achieving good predictive performance. |
Tasks | |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.03190v3 |
http://arxiv.org/pdf/1711.03190v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-credible-models |
Repo | https://github.com/nathanwang000/credible_learning |
Framework | pytorch |
A Tidy Data Model for Natural Language Processing using cleanNLP
Title | A Tidy Data Model for Natural Language Processing using cleanNLP |
Authors | Taylor Arnold |
Abstract | The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford’s CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and information extraction. |
Tasks | Coreference Resolution, Dependency Parsing, Entity Linking, Named Entity Recognition, Part-Of-Speech Tagging, Sentiment Analysis, Tokenization |
Published | 2017-03-27 |
URL | http://arxiv.org/abs/1703.09570v2 |
http://arxiv.org/pdf/1703.09570v2.pdf | |
PWC | https://paperswithcode.com/paper/a-tidy-data-model-for-natural-language |
Repo | https://github.com/statsmaths/cleanNLP |
Framework | none |
MojiTalk: Generating Emotional Responses at Scale
Title | MojiTalk: Generating Emotional Responses at Scale |
Authors | Xianda Zhou, William Yang Wang |
Abstract | Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. More specifically, we collect a large corpus of Twitter conversations that include emojis in the response, and assume the emojis convey the underlying emotions of the sentence. We then introduce a reinforced conditional variational encoder approach to train a deep generative model on these conversations, which allows us to use emojis to control the emotion of the generated text. Experimentally, we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions. |
Tasks | |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04090v2 |
http://arxiv.org/pdf/1711.04090v2.pdf | |
PWC | https://paperswithcode.com/paper/mojitalk-generating-emotional-responses-at |
Repo | https://github.com/Claude-Zhou/MojiTalk |
Framework | tf |