Paper Group NAWR 10
Exploring Neural Text Simplification Models. End-to-End Deep Learning of Optimization Heuristics. LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test. ATM: A distributed, collaborative, scalable system for automated machine learning. Semi-supervised Structured Prediction with Neural CRF Autoencoder. Aggregating and Predicting Se …
Exploring Neural Text Simplification Models
Title | Exploring Neural Text Simplification Models |
Authors | Sergiu Nisioi, Sanja {\v{S}}tajner, Simone Paolo Ponzetto, Liviu P. Dinu |
Abstract | We present the first attempt at using sequence to sequence neural networks to model text simplification (TS). Unlike the previously proposed automated TS systems, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction. An extensive human evaluation of the output has shown that NTS systems achieve almost perfect grammaticality and meaning preservation of output sentences and higher level of simplification than the state-of-the-art automated TS systems |
Tasks | Lexical Simplification, Machine Translation, Text Simplification, Word Embeddings |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2014/ |
https://www.aclweb.org/anthology/P17-2014 | |
PWC | https://paperswithcode.com/paper/exploring-neural-text-simplification-models |
Repo | https://github.com/senisioi/NeuralTextSimplification |
Framework | torch |
End-to-End Deep Learning of Optimization Heuristics
Title | End-to-End Deep Learning of Optimization Heuristics |
Authors | Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather |
Abstract | Accurate automatic optimization heuristics are necessary for dealing with the complexity and diversity of modern hardware and software. Machine learning is a proven technique for learning such heuristics, but its success is bound by the quality of the features used. These features must be hand crafted by developers through a combination of expert domain knowledge and trial and error. This makes the quality of the final model directly dependent on the skill and available time of the system architect. Our work introduces a better way for building heuristics. We develop a deep neural network that learns heuristics over raw code, entirely without using code features. The neural network simultaneously constructs appropriate representations of the code and learns how best to optimize, removing the need for manual feature creation. Further, we show that our neural nets can transfer learning from one optimization problem to another, improving the accuracy of new models, without the help of human experts. We compare the effectiveness of our automatically generated heuristics against ones with features hand-picked by experts. We examine two challenging tasks:predicting optimal mapping for heterogeneous parallelism and GPU thread coarsening factors. In 89% of the cases, the quality of our fully automatic heuristics matches or surpasses that of state-of-the-art predictive models using hand-crafted features, providing on average 14% and 12% more performance with no human effort expended on designing features. |
Tasks | Transfer Learning |
Published | 2017-11-02 |
URL | https://ieeexplore.ieee.org/document/8091247 |
http://homepages.inf.ed.ac.uk/hleather/publications/2017-deepopt-pact.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-deep-learning-of-optimization |
Repo | https://github.com/ChrisCummins/paper-end2end-dl |
Framework | none |
LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test
Title | LSDSem 2017: Exploring Data Generation Methods for the Story Cloze Test |
Authors | Michael Bugert, Yevgeniy Puzikov, Andreas R{"u}ckl{'e}, Judith Eckle-Kohler, Teresa Martin, Eugenio Mart{'\i}nez-C{'a}mara, Daniil Sorokin, Maxime Peyrard, Iryna Gurevych |
Abstract | The Story Cloze test is a recent effort in providing a common test scenario for text understanding systems. As part of the LSDSem 2017 shared task, we present a system based on a deep learning architecture combined with a rich set of manually-crafted linguistic features. The system outperforms all known baselines for the task, suggesting that the chosen approach is promising. We additionally present two methods for generating further training data based on stories from the ROCStories corpus. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-0908/ |
https://www.aclweb.org/anthology/W17-0908 | |
PWC | https://paperswithcode.com/paper/lsdsem-2017-exploring-data-generation-methods |
Repo | https://github.com/UKPLab/lsdsem2017-story-cloze |
Framework | tf |
ATM: A distributed, collaborative, scalable system for automated machine learning
Title | ATM: A distributed, collaborative, scalable system for automated machine learning |
Authors | Thomas Swearingen, Will Drevo, Bennett Cyphers, Alfredo Cuesta-Infante, Arun Ross, Kalyan Veeramachaneni |
Abstract | In this paper, we present Auto-Tuned Models, or ATM, a distributed, collaborative, scalable system for automated machine learning. Users of ATM can simply upload a dataset, choose a subset of modeling methods, and choose to use ATM’s hybrid Bayesian and multi-armed bandit optimization system. The distributed system works in a load balanced fashion to quickly deliver results in the form of ready-to-predict models, confusion matrices, cross-validation results, and training timings. By automating hyperparameter tuning and model selection, ATM returns the emphasis of the machine learning workflow to its most irreducible part: feature engineering. We demonstrate the usefulness of ATM on 420 datasets from OpenML and train over 3 million classifiers. Our initial results show ATM can beat human-generated solutions for 30% of the datasets, and can do so in 1/100th of the time. |
Tasks | AutoML, Feature Engineering, Hyperparameter Optimization, Model Selection |
Published | 2017-12-11 |
URL | https://ieeexplore.ieee.org/document/8257923 |
https://dai.lids.mit.edu/wp-content/uploads/2018/02/atm_IEEE_BIgData-9-1.pdf | |
PWC | https://paperswithcode.com/paper/atm-a-distributed-collaborative-scalable |
Repo | https://github.com/HDI-Project/ATM |
Framework | none |
Semi-supervised Structured Prediction with Neural CRF Autoencoder
Title | Semi-supervised Structured Prediction with Neural CRF Autoencoder |
Authors | Xiao Zhang, Yong Jiang, Hao Peng, Kewei Tu, Dan Goldwasser |
Abstract | In this paper we propose an end-to-end neural CRF autoencoder (NCRF-AE) model for semi-supervised learning of sequential structured prediction problems. Our NCRF-AE consists of two parts: an encoder which is a CRF model enhanced by deep neural networks, and a decoder which is a generative model trying to reconstruct the input. Our model has a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We developed a variation of the EM algorithm for optimizing both the encoder and the decoder simultaneously by decoupling their parameters. Our Experimental results over the Part-of-Speech (POS) tagging task on eight different languages, show that our model can outperform competitive systems in both supervised and semi-supervised scenarios. |
Tasks | Part-Of-Speech Tagging, Structured Prediction |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1179/ |
https://www.aclweb.org/anthology/D17-1179 | |
PWC | https://paperswithcode.com/paper/semi-supervised-structured-prediction-with |
Repo | https://github.com/cosmozhang/NCRF-AE |
Framework | none |
Aggregating and Predicting Sequence Labels from Crowd Annotations
Title | Aggregating and Predicting Sequence Labels from Crowd Annotations |
Authors | An Thanh Nguyen, Byron Wallace, Junyi Jessy Li, Ani Nenkova, Matthew Lease |
Abstract | Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online. |
Tasks | Named Entity Recognition, Part-Of-Speech Tagging |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-1028/ |
https://www.aclweb.org/anthology/P17-1028 | |
PWC | https://paperswithcode.com/paper/aggregating-and-predicting-sequence-labels |
Repo | https://github.com/thanhan/seqcrowd-acl17 |
Framework | none |
Deep Pyramid Convolutional Neural Networks for Text Categorization
Title | Deep Pyramid Convolutional Neural Networks for Text Categorization |
Authors | Rie Johnson, Tong Zhang |
Abstract | This paper proposes a low-complexity word-level deep convolutional neural network (CNN) architecture for text categorization that can efficiently represent long-range associations in text. In the literature, several deep and complex neural networks have been proposed for this task, assuming availability of relatively large amounts of training data. However, the associated computational complexity increases as the networks go deeper, which poses serious challenges in practical applications. Moreover, it was shown recently that shallow word-level CNNs are more accurate and much faster than the state-of-the-art very deep nets such as character-level CNNs even in the setting of large training data. Motivated by these findings, we carefully studied deepening of word-level CNNs to capture global representations of text, and found a simple network architecture with which the best accuracy can be obtained by increasing the network depth without increasing computational cost by much. We call it deep pyramid CNN. The proposed model with 15 weight layers outperforms the previous best models on six benchmark datasets for sentiment classification and topic categorization. |
Tasks | Sentiment Analysis, Text Classification |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-1052/ |
https://www.aclweb.org/anthology/P17-1052 | |
PWC | https://paperswithcode.com/paper/deep-pyramid-convolutional-neural-networks |
Repo | https://github.com/Cheneng/DPCNN |
Framework | pytorch |
Machine Learning of Accurate Energy-conserving Molecular Force Fields
Title | Machine Learning of Accurate Energy-conserving Molecular Force Fields |
Authors | Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., Müller, K.-R. |
Abstract | Using conservation of energy—a fundamental property of closed classical and quantum mechanical systems—we develop an efficient gradient-domain machine learning (GDML) approach to construct accurate molecular force fields using a restricted number of samples from ab initio molecular dynamics (AIMD) trajectories. The GDML implementation is able to reproduce global potential energy surfaces of intermediate-sized molecules with an accuracy of 0.3 kcal mol−1 for energies and 1 kcal mol−1 Å̊−1 for atomic forces using only 1000 conformational geometries for training. We demonstrate this accuracy for AIMD trajectories of molecules, including benzene, toluene, naphthalene, ethanol, uracil, and aspirin. The challenge of constructing conservative force fields is accomplished in our work by learning in a Hilbert space of vector-valued functions that obey the law of energy conservation. The GDML approach enables quantitative molecular dynamics simulations for molecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force fields with the accuracy and transferability of high-level ab initio methods. |
Tasks | MD17 dataset |
Published | 2017-05-05 |
URL | https://advances.sciencemag.org/content/3/5/e1603015 |
https://advances.sciencemag.org/content/3/5/e1603015/tab-pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-of-accurate-energy |
Repo | https://github.com/stefanch/sGDML |
Framework | pytorch |
RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos
Title | RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos |
Authors | Wenbin Du, Yali Wang, Yu Qiao |
Abstract | Recent studies demonstrate the effectiveness of Recurrent Neural Networks (RNNs) for action recognition in videos. However, previous works mainly utilize video-level category as supervision to train RNNs, which may prohibit RNNs to learn complex motion structures along time. In this paper, we propose a recurrent pose-attention network (RPAN) to address this challenge, where we introduce a novel pose-attention mechanism to adaptively learn pose-related features at every time-step action prediction of RNNs. More specifically, we make three main contributions in this paper. Firstly, unlike previous works on pose-related action recognition, our RPAN is an end-to-end recurrent network which can exploit important spatial-temporal evolutions of human pose to assist action recognition in a unified framework. Secondly, instead of learning individual human-joint features separately, our pose-attention mechanism learns robust human-part features by sharing attention parameters partially on the semantically-related human joints. These human-part features are then fed into the human-part pooling layer to construct a highly-discriminative pose-related representation for temporal action modeling. Thirdly, one important byproduct of our RPAN is pose estimation in videos, which can be used for coarse pose annotation in action videos. We evaluate the proposed RPAN quantitatively and qualitatively on two popular benchmarks, i.e., Sub-JHMDB and PennAction. Experimental results show that RPAN outperforms the recent state-of-the-art methods on these challenging datasets. |
Tasks | Action Recognition In Videos, Pose Estimation, Skeleton Based Action Recognition |
Published | 2017-10-22 |
URL | https://doi.org/10.1109/ICCV.2017.402 |
https://doi.org/10.1109/ICCV.2017.402 | |
PWC | https://paperswithcode.com/paper/rpan-an-end-to-end-recurrent-pose-attention-1 |
Repo | https://github.com/agethen/RPAN |
Framework | tf |
On-demand Injection of Lexical Knowledge for Recognising Textual Entailment
Title | On-demand Injection of Lexical Knowledge for Recognising Textual Entailment |
Authors | Pascual Mart{'\i}nez-G{'o}mez, Koji Mineshima, Yusuke Miyao, Daisuke Bekki |
Abstract | We approach the recognition of textual entailment using logical semantic representations and a theorem prover. In this setup, lexical divergences that preserve semantic entailment between the source and target texts need to be explicitly stated. However, recognising subsentential semantic relations is not trivial. We address this problem by monitoring the proof of the theorem and detecting unprovable sub-goals that share predicate arguments with logical premises. If a linguistic relation exists, then an appropriate axiom is constructed on-demand and the theorem proving continues. Experiments show that this approach is effective and precise, producing a system that outperforms other logic-based systems and is competitive with state-of-the-art statistical methods. |
Tasks | Automated Theorem Proving, Information Retrieval, Natural Language Inference, Question Answering |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1067/ |
https://www.aclweb.org/anthology/E17-1067 | |
PWC | https://paperswithcode.com/paper/on-demand-injection-of-lexical-knowledge-for |
Repo | https://github.com/mynlp/ccg2lambda |
Framework | none |
Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks?
Title | Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks? |
Authors | Shangmin Guo, Xiangrong Zeng, Shizhu He, Kang Liu, Jun Zhao |
Abstract | As one of the most important test of China, Gaokao is designed to be difficult enough to distinguish the excellent high school students. In this work, we detailed the Gaokao History Multiple Choice Questions(GKHMC) and proposed two different approaches to address them using various resources. One approach is based on entity search technique (IR approach), the other is based on text entailment approach where we specifically employ deep neural networks(NN approach). The result of experiment on our collected real Gaokao questions showed that they are good at different categories of questions, that is IR approach performs much better at entity questions(EQs) while NN approach shows its advantage on sentence questions(SQs). We achieve state-of-the-art performance and show that it{'}s indispensable to apply hybrid method when participating in the real-world tests. |
Tasks | Information Retrieval, Question Answering, Reading Comprehension |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1011/ |
https://www.aclweb.org/anthology/E17-1011 | |
PWC | https://paperswithcode.com/paper/which-is-the-effective-way-for-gaokao |
Repo | https://github.com/IACASNLPIR/GKHMC |
Framework | none |
Beyond Filters: Compact Feature Map for Portable Deep Model
Title | Beyond Filters: Compact Feature Map for Portable Deep Model |
Authors | Yunhe Wang, Chang Xu, Chao Xu, Dacheng Tao |
Abstract | Convolutional neural networks (CNNs) have shown extraordinary performance in a number of applications, but they are usually of heavy design for the accuracy reason. Beyond compressing the filters in CNNs, this paper focuses on the redundancy in the feature maps derived from the large number of filters in a layer. We propose to extract intrinsic representation of the feature maps and preserve the discriminability of the features. Circulant matrix is employed to formulate the feature map transformation, which only requires O(dlog d) computation complexity to embed a d-dimensional feature map. The filter is then re-configured to establish the mapping from original input to the new compact feature map, and the resulting network can preserve intrinsic information of the original network with significantly fewer parameters, which not only decreases the online memory for launching CNN but also accelerates the computation speed. Experiments on benchmark image datasets demonstrate the superiority of the proposed algorithm over state-of-the-art methods. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=466 |
http://proceedings.mlr.press/v70/wang17m/wang17m.pdf | |
PWC | https://paperswithcode.com/paper/beyond-filters-compact-feature-map-for |
Repo | https://github.com/YunheWang/RedCNN |
Framework | none |
End-to-End System for Bacteria Habitat Extraction
Title | End-to-End System for Bacteria Habitat Extraction |
Authors | Farrokh Mehryary, Kai Hakala, Suwisa Kaewphan, Jari Bj{"o}rne, Tapio Salakoski, Filip Ginter |
Abstract | We introduce an end-to-end system capable of named-entity detection, normalization and relation extraction for extracting information about bacteria and their habitats from biomedical literature. Our system is based on deep learning, CRF classifiers and vector space models. We train and evaluate the system on the BioNLP 2016 Shared Task Bacteria Biotope data. The official evaluation shows that the joint performance of our entity detection and relation extraction models outperforms the winning team of the Shared Task by 19pp on F1-score, establishing a new top score for the task. We also achieve state-of-the-art results in the normalization task. Our system is open source and freely available at \url{https://github.com/TurkuNLP/BHE}. |
Tasks | Named Entity Recognition, Relation Extraction |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2310/ |
https://www.aclweb.org/anthology/W17-2310 | |
PWC | https://paperswithcode.com/paper/end-to-end-system-for-bacteria-habitat |
Repo | https://github.com/TurkuNLP/BHE |
Framework | none |
UD Annotatrix: An annotation tool for Universal Dependencies
Title | UD Annotatrix: An annotation tool for Universal Dependencies |
Authors | Francis M. Tyers, Mariya Sheyanova, Jonathan North Washington |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-7604/ |
https://www.aclweb.org/anthology/W17-7604 | |
PWC | https://paperswithcode.com/paper/ud-annotatrix-an-annotation-tool-for |
Repo | https://github.com/jonorthwash/ud-annotatrix |
Framework | none |
Deep IV: A Flexible Approach for Counterfactual Prediction
Title | Deep IV: A Flexible Approach for Counterfactual Prediction |
Authors | Jason Hartford, Greg Lewis, Kevin Leyton-Brown, Matt Taddy |
Abstract | Counterfactual prediction requires understanding causal relationships between so-called treatment and outcome variables. This paper provides a recipe for augmenting deep learning methods to accurately characterize such relationships in the presence of instrument variables (IVs) – sources of treatment randomization that are conditionally independent from the outcomes. Our IV specification resolves into two prediction tasks that can be solved with deep neural nets: a first-stage network for treatment prediction and a second-stage network whose loss function involves integration over the conditional treatment distribution. This Deep IV framework allows us to take advantage of off-the-shelf supervised learning techniques to estimate causal effects by adapting the loss function. Experiments show that it outperforms existing machine learning approaches. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=883 |
http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf | |
PWC | https://paperswithcode.com/paper/deep-iv-a-flexible-approach-for |
Repo | https://github.com/jhartford/DeepIV |
Framework | none |