Paper Group NANR 224
Clustering Convolutional Kernels to Compress Deep Neural Networks. EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words. TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring. Translating a Math Word Problem to a Expression Tree. N …
Clustering Convolutional Kernels to Compress Deep Neural Networks
Title | Clustering Convolutional Kernels to Compress Deep Neural Networks |
Authors | Sanghyun Son, Seungjun Nah, Kyoung Mu Lee |
Abstract | In this paper, we propose a novel method to compress CNNs by reconstructing the network from a small set of spatial convolution kernels. Starting from a pre-trained model, we extract representative 2D kernel centroids using k-means clustering. Each centroid replaces the corresponding kernels of the same cluster, and we use indexed representations instead of saving whole kernels. Kernels in the same cluster share their weights, and we fine-tune the model while keeping the compressed state. Furthermore, we also suggest an efficient way of removing redundant calculations in the compressed convolutional layers. We experimentally show that our technique works well without harming the accuracy of widely-used CNNs. Also, our ResNet-18 even outperforms its uncompressed counterpart at ILSVRC2012 classification task with over 10x compression ratio. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Son_Clustering_Kernels_for_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Son_Clustering_Kernels_for_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/clustering-convolutional-kernels-to-compress |
Repo | |
Framework | |
EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words
Title | EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words |
Authors | Shiyun Chen, Maoquan Wang, Liang He |
Abstract | This paper presents our single model to Subtask 1 of SemEval 2018 Task 2: Emoji Prediction in English. In order to predict the emoji that may be contained in a tweet, the basic model we use is an attention-based recurrent neural network which has achieved satisfactory performs in Natural Language processing. Considering the text comes from social media, it contains many discrepant abbreviations and online terms, we also combine word-level and character-level word vector embedding to better handling the words not appear in the vocabulary. Our single model1 achieved 29.50{%} Macro F-score in test data and ranks 9th among 48 teams. |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1066/ |
https://www.aclweb.org/anthology/S18-1066 | |
PWC | https://paperswithcode.com/paper/emojiit-at-semeval-2018-task-2-an-effective |
Repo | |
Framework | |
TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring
Title | TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring |
Authors | Cancan Jin, Ben He, Kai Hui, Le Sun |
Abstract | Existing automated essay scoring (AES) models rely on rated essays for the target prompt as training data. Despite their successes in prompt-dependent AES, how to effectively predict essay ratings under a prompt-independent setting remains a challenge, where the rated essays for the target prompt are not available. To close this gap, a two-stage deep neural network (TDNN) is proposed. In particular, in the first stage, using the rated essays for non-target prompts as the training data, a shallow model is learned to select essays with an extreme quality for the target prompt, serving as pseudo training data; in the second stage, an end-to-end hybrid deep model is proposed to learn a prompt-dependent rating model consuming the pseudo training data from the first step. Evaluation of the proposed TDNN on the standard ASAP dataset demonstrates a promising improvement for the prompt-independent AES task. |
Tasks | |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1100/ |
https://www.aclweb.org/anthology/P18-1100 | |
PWC | https://paperswithcode.com/paper/tdnn-a-two-stage-deep-neural-network-for |
Repo | |
Framework | |
Translating a Math Word Problem to a Expression Tree
Title | Translating a Math Word Problem to a Expression Tree |
Authors | Lei Wang, Yan Wang, Deng Cai, Dongxiang Zhang, Xiaojiang Liu |
Abstract | Sequence-to-sequence (SEQ2SEQ) models have been successfully applied to automatic math word problem solving. Despite its simplicity, a drawback still remains: a math word problem can be correctly solved by more than one equations. This non-deterministic transduction harms the performance of maximum likelihood estimation. In this paper, by considering the uniqueness of expression tree, we propose an equation normalization method to normalize the duplicated equations. Moreover, we analyze the performance of three popular SEQ2SEQ models on the math word problem solving. We find that each model has its own specialty in solving problems, consequently an ensemble model is then proposed to combine their advantages. Experiments on dataset Math23K show that the ensemble model with equation normalization significantly outperforms the previous state-of-the-art methods. |
Tasks | Machine Translation, Math Word Problem Solving, Semantic Parsing |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1132/ |
https://www.aclweb.org/anthology/D18-1132 | |
PWC | https://paperswithcode.com/paper/translating-a-math-word-problem-to-a |
Repo | |
Framework | |
Neural Math Word Problem Solver with Reinforcement Learning
Title | Neural Math Word Problem Solver with Reinforcement Learning |
Authors | Danqing Huang, Jing Liu, Chin-Yew Lin, Jian Yin |
Abstract | Sequence-to-sequence model has been applied to solve math word problems. The model takes math problem descriptions as input and generates equations as output. The advantage of sequence-to-sequence model requires no feature engineering and can generate equations that do not exist in training data. However, our experimental analysis reveals that this model suffers from two shortcomings: (1) generate spurious numbers; (2) generate numbers at wrong positions. In this paper, we propose incorporating copy and alignment mechanism to the sequence-to-sequence model (namely CASS) to address these shortcomings. To train our model, we apply reinforcement learning to directly optimize the solution accuracy. It overcomes the {``}train-test discrepancy{''} issue of maximum likelihood estimation, which uses the surrogate objective of maximizing equation likelihood during training while the evaluation metric is solution accuracy (non-differentiable) at test time. Furthermore, to explore the effectiveness of our neural model, we use our model output as a feature and incorporate it into the feature-based model. Experimental results show that (1) The copy and alignment mechanism is effective to address the two issues; (2) Reinforcement learning leads to better performance than maximum likelihood on this task; (3) Our neural model is complementary to the feature-based model and their combination significantly outperforms the state-of-the-art results. | |
Tasks | Feature Engineering, Math Word Problem Solving |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1018/ |
https://www.aclweb.org/anthology/C18-1018 | |
PWC | https://paperswithcode.com/paper/neural-math-word-problem-solver-with |
Repo | |
Framework | |
Using Intermediate Representations to Solve Math Word Problems
Title | Using Intermediate Representations to Solve Math Word Problems |
Authors | Danqing Huang, Jin-Ge Yao, Chin-Yew Lin, Qingyu Zhou, Jian Yin |
Abstract | To solve math word problems, previous statistical approaches attempt at learning a direct mapping from a problem description to its corresponding equation system. However, such mappings do not include the information of a few higher-order operations that cannot be explicitly represented in equations but are required to solve the problem. The gap between natural language and equations makes it difficult for a learned model to generalize from limited data. In this work we present an intermediate meaning representation scheme that tries to reduce this gap. We use a sequence-to-sequence model with a novel attention regularization term to generate the intermediate forms, then execute them to obtain the final answers. Since the intermediate forms are latent, we propose an iterative labeling framework for learning by leveraging supervision signals from both equations and answers. Our experiments show using intermediate forms outperforms directly predicting equations. |
Tasks | Math Word Problem Solving |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1039/ |
https://www.aclweb.org/anthology/P18-1039 | |
PWC | https://paperswithcode.com/paper/using-intermediate-representations-to-solve |
Repo | |
Framework | |
DenseASPP for Semantic Segmentation in Street Scenes
Title | DenseASPP for Semantic Segmentation in Street Scenes |
Authors | Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang |
Abstract | Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolutioncite{Deeplabv1} was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)cite{Deeplabv2} was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapescite{Cityscapes} and achieve state-of-the-art performance. |
Tasks | Autonomous Driving, Scene Understanding, Semantic Segmentation |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/denseaspp-for-semantic-segmentation-in-street |
Repo | |
Framework | |
Annotating Claims in the Vaccination Debate
Title | Annotating Claims in the Vaccination Debate |
Authors | Benedetta Torsi, Roser Morante |
Abstract | In this paper we present annotation experiments with three different annotation schemes for the identification of argument components in texts related to the vaccination debate. Identifying claims about vaccinations made by participants in the debate is of great societal interest, as the decision to vaccinate or not has impact in public health and safety. Since most corpora that have been annotated with argumentation information contain texts that belong to a specific genre and have a well defined argumentation structure, we needed to adjust the annotation schemes to our corpus, which contains heterogeneous texts from the Web. We started with a complex annotation scheme that had to be simplified due to low IAA. In our final experiment, which focused on annotating claims, annotators reached 57.3{%} IAA. |
Tasks | Argument Mining |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/W18-5207/ |
https://www.aclweb.org/anthology/W18-5207 | |
PWC | https://paperswithcode.com/paper/annotating-claims-in-the-vaccination-debate |
Repo | |
Framework | |
Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users’ Interest Level
Title | Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users’ Interest Level |
Authors | Masahiro Araki, Sayaka Tomimasu, Mikio Nakano, Kazunori Komatani, Shogo Okada, Shinya Fujie, Hiroaki Sugiyama |
Abstract | |
Tasks | |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1250/ |
https://www.aclweb.org/anthology/L18-1250 | |
PWC | https://paperswithcode.com/paper/collection-of-multimodal-dialog-data-and |
Repo | |
Framework | |
Using Semantics for Granularities of Tokenization
Title | Using Semantics for Granularities of Tokenization |
Authors | Martin Riedl, Chris Biemann |
Abstract | Depending on downstream applications, it is advisable to extend the notion of tokenization from low-level character-based token boundary detection to identification of meaningful and useful language units. This entails both identifying units composed of several single words that form a several single words that form a, as well as splitting single-word compounds into their meaningful parts. In this article, we introduce unsupervised and knowledge-free methods for these two tasks. The main novelty of our research is based on the fact that methods are primarily based on distributional similarity, of which we use two flavors: a sparse count-based and a dense neural-based distributional semantic model. First, we introduce DRUID, which is a method for detecting MWEs. The evaluation on MWE-annotated data sets in two languages and newly extracted evaluation data sets for 32 languages shows that DRUID compares favorably over previous methods not utilizing distributional information. Second, we present SECOS, an algorithm for decompounding close compounds. In an evaluation of four dedicated decompounding data sets across four languages and on data sets extracted from Wiktionary for 14 languages, we demonstrate the superiority of our approach over unsupervised baselines, sometimes even matching the performance of previous language-specific and supervised methods. In a final experiment, we show how both decompounding and MWE information can be used in information retrieval. Here, we obtain the best results when combining word information with MWEs and the compound parts in a bag-of-words retrieval set-up. Overall, our methodology paves the way to automatic detection of lexical units beyond standard tokenization techniques without language-specific preprocessing steps such as POS tagging. |
Tasks | Boundary Detection, Information Retrieval, Tokenization |
Published | 2018-09-01 |
URL | https://www.aclweb.org/anthology/J18-3005/ |
https://www.aclweb.org/anthology/J18-3005 | |
PWC | https://paperswithcode.com/paper/using-semantics-for-granularities-of |
Repo | |
Framework | |
Feature-Based Decipherment for Machine Translation
Title | Feature-Based Decipherment for Machine Translation |
Authors | Iftekhar Naim, Parker Riley, Daniel Gildea |
Abstract | Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts. |
Tasks | Machine Translation, Word Alignment |
Published | 2018-09-01 |
URL | https://www.aclweb.org/anthology/J18-3006/ |
https://www.aclweb.org/anthology/J18-3006 | |
PWC | https://paperswithcode.com/paper/feature-based-decipherment-for-machine |
Repo | |
Framework | |
Practical Parsing for Downstream Applications
Title | Practical Parsing for Downstream Applications |
Authors | Daniel Dakota, S K{"u}bler, ra |
Abstract | |
Tasks | Domain Adaptation, Question Answering |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-3002/ |
https://www.aclweb.org/anthology/C18-3002 | |
PWC | https://paperswithcode.com/paper/practical-parsing-for-downstream-applications |
Repo | |
Framework | |
Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation
Title | Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation |
Authors | Pasindu Tennage, S, Prabath aruwan, Malith Thilakarathne, Achini Herath, Surangika Ranathunga |
Abstract | |
Tasks | Data Augmentation, Machine Translation, Morphological Analysis |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1261/ |
https://www.aclweb.org/anthology/L18-1261 | |
PWC | https://paperswithcode.com/paper/handling-rare-word-problem-using-synthetic |
Repo | |
Framework | |
When do random forests fail?
Title | When do random forests fail? |
Authors | Cheng Tang, Damien Garreau, Ulrike Von Luxburg |
Abstract | Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7562-when-do-random-forests-fail |
http://papers.nips.cc/paper/7562-when-do-random-forests-fail.pdf | |
PWC | https://paperswithcode.com/paper/when-do-random-forests-fail |
Repo | |
Framework | |
Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?
Title | Squib: Reproducibility in Computational Linguistics: Are We Willing to Share? |
Authors | Martijn Wieling, Josine Rawee, Gertjan van Noord |
Abstract | This study focuses on an essential precondition for reproducibility in computational linguistics: the willingness of authors to share relevant source code and data. Ten years after Ted Pedersen{'}s influential {``}Last Words{''} contribution in Computational Linguistics, we investigate to what extent researchers in computational linguistics are willing and able to share their data and code. We surveyed all 395 full papers presented at the 2011 and 2016 ACL Annual Meetings, and identified whether links to data and code were provided. If working links were not provided, authors were requested to provide this information. Although data were often available, code was shared less often. When working links to code or data were not provided in the paper, authors provided the code in about one third of cases. For a selection of ten papers, we attempted to reproduce the results using the provided data and code. We were able to reproduce the results approximately for six papers. For only a single paper did we obtain the exact same results. Our findings show that even though the situation appears to have improved comparing 2016 to 2011, empiricism in computational linguistics still largely remains a matter of faith. Nevertheless, we are somewhat optimistic about the future. Ensuring reproducibility is not only important for the field as a whole, but also seems worthwhile for individual researchers: The median citation count for studies with working links to the source code is higher. | |
Tasks | |
Published | 2018-12-01 |
URL | https://www.aclweb.org/anthology/J18-4003/ |
https://www.aclweb.org/anthology/J18-4003 | |
PWC | https://paperswithcode.com/paper/squib-reproducibility-in-computational |
Repo | |
Framework | |