October 15, 2019

2589 words 13 mins read

Paper Group NANR 224

Clustering Convolutional Kernels to Compress Deep Neural Networks. EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words. TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring. Translating a Math Word Problem to a Expression Tree. N …

Clustering Convolutional Kernels to Compress Deep Neural Networks


Title	Clustering Convolutional Kernels to Compress Deep Neural Networks
Authors	Sanghyun Son, Seungjun Nah, Kyoung Mu Lee
Abstract	In this paper, we propose a novel method to compress CNNs by reconstructing the network from a small set of spatial convolution kernels. Starting from a pre-trained model, we extract representative 2D kernel centroids using k-means clustering. Each centroid replaces the corresponding kernels of the same cluster, and we use indexed representations instead of saving whole kernels. Kernels in the same cluster share their weights, and we fine-tune the model while keeping the compressed state. Furthermore, we also suggest an efficient way of removing redundant calculations in the compressed convolutional layers. We experimentally show that our technique works well without harming the accuracy of widely-used CNNs. Also, our ResNet-18 even outperforms its uncompressed counterpart at ILSVRC2012 classification task with over 10x compression ratio.
Tasks
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Sanghyun_Son_Clustering_Kernels_for_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Son_Clustering_Kernels_for_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/clustering-convolutional-kernels-to-compress
Repo
Framework

EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words


Title	EmojiIt at SemEval-2018 Task 2: An Effective Attention-Based Recurrent Neural Network Model for Emoji Prediction with Characters Gated Words
Authors	Shiyun Chen, Maoquan Wang, Liang He
Abstract	This paper presents our single model to Subtask 1 of SemEval 2018 Task 2: Emoji Prediction in English. In order to predict the emoji that may be contained in a tweet, the basic model we use is an attention-based recurrent neural network which has achieved satisfactory performs in Natural Language processing. Considering the text comes from social media, it contains many discrepant abbreviations and online terms, we also combine word-level and character-level word vector embedding to better handling the words not appear in the vocabulary. Our single model1 achieved 29.50{%} Macro F-score in test data and ranks 9th among 48 teams.
Tasks
Published	2018-06-01
URL	https://www.aclweb.org/anthology/S18-1066/
PDF	https://www.aclweb.org/anthology/S18-1066
PWC	https://paperswithcode.com/paper/emojiit-at-semeval-2018-task-2-an-effective
Repo
Framework

TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring


Title	TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring
Authors	Cancan Jin, Ben He, Kai Hui, Le Sun
Abstract	Existing automated essay scoring (AES) models rely on rated essays for the target prompt as training data. Despite their successes in prompt-dependent AES, how to effectively predict essay ratings under a prompt-independent setting remains a challenge, where the rated essays for the target prompt are not available. To close this gap, a two-stage deep neural network (TDNN) is proposed. In particular, in the first stage, using the rated essays for non-target prompts as the training data, a shallow model is learned to select essays with an extreme quality for the target prompt, serving as pseudo training data; in the second stage, an end-to-end hybrid deep model is proposed to learn a prompt-dependent rating model consuming the pseudo training data from the first step. Evaluation of the proposed TDNN on the standard ASAP dataset demonstrates a promising improvement for the prompt-independent AES task.
Tasks
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-1100/
PDF	https://www.aclweb.org/anthology/P18-1100
PWC	https://paperswithcode.com/paper/tdnn-a-two-stage-deep-neural-network-for
Repo
Framework

Translating a Math Word Problem to a Expression Tree


Title	Translating a Math Word Problem to a Expression Tree
Authors	Lei Wang, Yan Wang, Deng Cai, Dongxiang Zhang, Xiaojiang Liu
Abstract	Sequence-to-sequence (SEQ2SEQ) models have been successfully applied to automatic math word problem solving. Despite its simplicity, a drawback still remains: a math word problem can be correctly solved by more than one equations. This non-deterministic transduction harms the performance of maximum likelihood estimation. In this paper, by considering the uniqueness of expression tree, we propose an equation normalization method to normalize the duplicated equations. Moreover, we analyze the performance of three popular SEQ2SEQ models on the math word problem solving. We find that each model has its own specialty in solving problems, consequently an ensemble model is then proposed to combine their advantages. Experiments on dataset Math23K show that the ensemble model with equation normalization significantly outperforms the previous state-of-the-art methods.
Tasks	Machine Translation, Math Word Problem Solving, Semantic Parsing
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1132/
PDF	https://www.aclweb.org/anthology/D18-1132
PWC	https://paperswithcode.com/paper/translating-a-math-word-problem-to-a
Repo
Framework

Neural Math Word Problem Solver with Reinforcement Learning


Title	Neural Math Word Problem Solver with Reinforcement Learning
Authors	Danqing Huang, Jing Liu, Chin-Yew Lin, Jian Yin
Abstract	Sequence-to-sequence model has been applied to solve math word problems. The model takes math problem descriptions as input and generates equations as output. The advantage of sequence-to-sequence model requires no feature engineering and can generate equations that do not exist in training data. However, our experimental analysis reveals that this model suffers from two shortcomings: (1) generate spurious numbers; (2) generate numbers at wrong positions. In this paper, we propose incorporating copy and alignment mechanism to the sequence-to-sequence model (namely CASS) to address these shortcomings. To train our model, we apply reinforcement learning to directly optimize the solution accuracy. It overcomes the {``}train-test discrepancy{''} issue of maximum likelihood estimation, which uses the surrogate objective of maximizing equation likelihood during training while the evaluation metric is solution accuracy (non-differentiable) at test time. Furthermore, to explore the effectiveness of our neural model, we use our model output as a feature and incorporate it into the feature-based model. Experimental results show that (1) The copy and alignment mechanism is effective to address the two issues; (2) Reinforcement learning leads to better performance than maximum likelihood on this task; (3) Our neural model is complementary to the feature-based model and their combination significantly outperforms the state-of-the-art results. \|
Tasks	Feature Engineering, Math Word Problem Solving
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1018/
PDF	https://www.aclweb.org/anthology/C18-1018
PWC	https://paperswithcode.com/paper/neural-math-word-problem-solver-with
Repo
Framework

Using Intermediate Representations to Solve Math Word Problems


Title	Using Intermediate Representations to Solve Math Word Problems
Authors	Danqing Huang, Jin-Ge Yao, Chin-Yew Lin, Qingyu Zhou, Jian Yin
Abstract	To solve math word problems, previous statistical approaches attempt at learning a direct mapping from a problem description to its corresponding equation system. However, such mappings do not include the information of a few higher-order operations that cannot be explicitly represented in equations but are required to solve the problem. The gap between natural language and equations makes it difficult for a learned model to generalize from limited data. In this work we present an intermediate meaning representation scheme that tries to reduce this gap. We use a sequence-to-sequence model with a novel attention regularization term to generate the intermediate forms, then execute them to obtain the final answers. Since the intermediate forms are latent, we propose an iterative labeling framework for learning by leveraging supervision signals from both equations and answers. Our experiments show using intermediate forms outperforms directly predicting equations.
Tasks	Math Word Problem Solving
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-1039/
PDF	https://www.aclweb.org/anthology/P18-1039
PWC	https://paperswithcode.com/paper/using-intermediate-representations-to-solve
Repo
Framework

DenseASPP for Semantic Segmentation in Street Scenes


Title	DenseASPP for Semantic Segmentation in Street Scenes
Authors	Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang
Abstract	Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolutioncite{Deeplabv1} was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)cite{Deeplabv2} was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapescite{Cityscapes} and achieve state-of-the-art performance.
Tasks	Autonomous Driving, Scene Understanding, Semantic Segmentation
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/denseaspp-for-semantic-segmentation-in-street
Repo
Framework

Annotating Claims in the Vaccination Debate


Title	Annotating Claims in the Vaccination Debate
Authors	Benedetta Torsi, Roser Morante
Abstract	In this paper we present annotation experiments with three different annotation schemes for the identification of argument components in texts related to the vaccination debate. Identifying claims about vaccinations made by participants in the debate is of great societal interest, as the decision to vaccinate or not has impact in public health and safety. Since most corpora that have been annotated with argumentation information contain texts that belong to a specific genre and have a well defined argumentation structure, we needed to adjust the annotation schemes to our corpus, which contains heterogeneous texts from the Web. We started with a complex annotation scheme that had to be simplified due to low IAA. In our final experiment, which focused on annotating claims, annotators reached 57.3{%} IAA.
Tasks	Argument Mining
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-5207/
PDF	https://www.aclweb.org/anthology/W18-5207
PWC	https://paperswithcode.com/paper/annotating-claims-in-the-vaccination-debate
Repo
Framework

Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users’ Interest Level


Title	Collection of Multimodal Dialog Data and Analysis of the Result of Annotation of Users’ Interest Level
Authors	Masahiro Araki, Sayaka Tomimasu, Mikio Nakano, Kazunori Komatani, Shogo Okada, Shinya Fujie, Hiroaki Sugiyama
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1250/
PDF	https://www.aclweb.org/anthology/L18-1250
PWC	https://paperswithcode.com/paper/collection-of-multimodal-dialog-data-and
Repo
Framework

Using Semantics for Granularities of Tokenization


Title	Using Semantics for Granularities of Tokenization
Authors	Martin Riedl, Chris Biemann
Abstract	Depending on downstream applications, it is advisable to extend the notion of tokenization from low-level character-based token boundary detection to identification of meaningful and useful language units. This entails both identifying units composed of several single words that form a several single words that form a, as well as splitting single-word compounds into their meaningful parts. In this article, we introduce unsupervised and knowledge-free methods for these two tasks. The main novelty of our research is based on the fact that methods are primarily based on distributional similarity, of which we use two flavors: a sparse count-based and a dense neural-based distributional semantic model. First, we introduce DRUID, which is a method for detecting MWEs. The evaluation on MWE-annotated data sets in two languages and newly extracted evaluation data sets for 32 languages shows that DRUID compares favorably over previous methods not utilizing distributional information. Second, we present SECOS, an algorithm for decompounding close compounds. In an evaluation of four dedicated decompounding data sets across four languages and on data sets extracted from Wiktionary for 14 languages, we demonstrate the superiority of our approach over unsupervised baselines, sometimes even matching the performance of previous language-specific and supervised methods. In a final experiment, we show how both decompounding and MWE information can be used in information retrieval. Here, we obtain the best results when combining word information with MWEs and the compound parts in a bag-of-words retrieval set-up. Overall, our methodology paves the way to automatic detection of lexical units beyond standard tokenization techniques without language-specific preprocessing steps such as POS tagging.
Tasks	Boundary Detection, Information Retrieval, Tokenization
Published	2018-09-01
URL	https://www.aclweb.org/anthology/J18-3005/
PDF	https://www.aclweb.org/anthology/J18-3005
PWC	https://paperswithcode.com/paper/using-semantics-for-granularities-of
Repo
Framework

Feature-Based Decipherment for Machine Translation


Title	Feature-Based Decipherment for Machine Translation
Authors	Iftekhar Naim, Parker Riley, Daniel Gildea
Abstract	Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.
Tasks	Machine Translation, Word Alignment
Published	2018-09-01
URL	https://www.aclweb.org/anthology/J18-3006/
PDF	https://www.aclweb.org/anthology/J18-3006
PWC	https://paperswithcode.com/paper/feature-based-decipherment-for-machine
Repo
Framework

Practical Parsing for Downstream Applications


Title	Practical Parsing for Downstream Applications
Authors	Daniel Dakota, S K{"u}bler, ra
Abstract
Tasks	Domain Adaptation, Question Answering
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-3002/
PDF	https://www.aclweb.org/anthology/C18-3002
PWC	https://paperswithcode.com/paper/practical-parsing-for-downstream-applications
Repo
Framework

Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation


Title	Handling Rare Word Problem using Synthetic Training Data for Sinhala and Tamil Neural Machine Translation
Authors	Pasindu Tennage, S, Prabath aruwan, Malith Thilakarathne, Achini Herath, Surangika Ranathunga
Abstract
Tasks	Data Augmentation, Machine Translation, Morphological Analysis
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1261/
PDF	https://www.aclweb.org/anthology/L18-1261
PWC	https://paperswithcode.com/paper/handling-rare-word-problem-using-synthetic
Repo
Framework

When do random forests fail?


Title	When do random forests fail?
Authors	Cheng Tang, Damien Garreau, Ulrike Von Luxburg
Abstract	Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7562-when-do-random-forests-fail
PDF	http://papers.nips.cc/paper/7562-when-do-random-forests-fail.pdf
PWC	https://paperswithcode.com/paper/when-do-random-forests-fail
Repo
Framework


Title	Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?
Authors	Martijn Wieling, Josine Rawee, Gertjan van Noord
Abstract	This study focuses on an essential precondition for reproducibility in computational linguistics: the willingness of authors to share relevant source code and data. Ten years after Ted Pedersen{'}s influential {``}Last Words{''} contribution in Computational Linguistics, we investigate to what extent researchers in computational linguistics are willing and able to share their data and code. We surveyed all 395 full papers presented at the 2011 and 2016 ACL Annual Meetings, and identified whether links to data and code were provided. If working links were not provided, authors were requested to provide this information. Although data were often available, code was shared less often. When working links to code or data were not provided in the paper, authors provided the code in about one third of cases. For a selection of ten papers, we attempted to reproduce the results using the provided data and code. We were able to reproduce the results approximately for six papers. For only a single paper did we obtain the exact same results. Our findings show that even though the situation appears to have improved comparing 2016 to 2011, empiricism in computational linguistics still largely remains a matter of faith. Nevertheless, we are somewhat optimistic about the future. Ensuring reproducibility is not only important for the field as a whole, but also seems worthwhile for individual researchers: The median citation count for studies with working links to the source code is higher. \|
Tasks
Published	2018-12-01
URL	https://www.aclweb.org/anthology/J18-4003/
PDF	https://www.aclweb.org/anthology/J18-4003
PWC	https://paperswithcode.com/paper/squib-reproducibility-in-computational
Repo
Framework