July 26, 2019

2370 words 12 mins read

Paper Group NANR 87

Paper Group NANR 87

Human and Automated CEFR-based Grading of Short Answers. Subset Selection under Noise. Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video. Multilingual Ontologies for the Representation and Processing of Folktales. Survey: Multiword Expression Processing: A Survey. QUINT: Interpretable Question Answering over Knowledge Bas …

Human and Automated CEFR-based Grading of Short Answers

Title Human and Automated CEFR-based Grading of Short Answers
Authors Ana{"\i}s Tack, Thomas Fran{\c{c}}ois, Sophie Roekhaut, C{'e}drick Fairon
Abstract This paper is concerned with the task of automatically assessing the written proficiency level of non-native (L2) learners of English. Drawing on previous research on automated L2 writing assessment following the Common European Framework of Reference for Languages (CEFR), we investigate the possibilities and difficulties of deriving the CEFR level from short answers to open-ended questions, which has not yet been subjected to numerous studies up to date. The object of our study is twofold: to examine the intricacy involved with both human and automated CEFR-based grading of short answers. On the one hand, we describe the compilation of a learner corpus of short answers graded with CEFR levels by three certified Cambridge examiners. We mainly observe that, although the shortness of the answers is reported as undermining a clear-cut evaluation, the length of the answer does not necessarily correlate with inter-examiner disagreement. On the other hand, we explore the development of a soft-voting system for the automated CEFR-based grading of short answers and draw tentative conclusions about its use in a computer-assisted testing (CAT) setting.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-5018/
PDF https://www.aclweb.org/anthology/W17-5018
PWC https://paperswithcode.com/paper/human-and-automated-cefr-based-grading-of
Repo
Framework

Subset Selection under Noise

Title Subset Selection under Noise
Authors Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang, Zhi-Hua Zhou
Abstract The problem of selecting the best $k$-element subset from a universe is involved in many applications. While previous studies assumed a noise-free environment or a noisy monotone submodular objective function, this paper considers a more realistic and general situation where the evaluation of a subset is a noisy monotone function (not necessarily submodular), with both multiplicative and additive noises. To understand the impact of the noise, we firstly show the approximation ratio of the greedy algorithm and POSS, two powerful algorithms for noise-free subset selection, in the noisy environments. We then propose to incorporate a noise-aware strategy into POSS, resulting in the new PONSS algorithm. We prove that PONSS can achieve a better approximation ratio under some assumption such as i.i.d. noise distribution. The empirical results on influence maximization and sparse regression problems show the superior performance of PONSS.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6947-subset-selection-under-noise
PDF http://papers.nips.cc/paper/6947-subset-selection-under-noise.pdf
PWC https://paperswithcode.com/paper/subset-selection-under-noise
Repo
Framework

Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video

Title Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video
Authors Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, William T. Freeman
Abstract The estimation of material properties is important for scene understanding, with many applications in vision, robotics, andstructural engineering. This paper connects fundamentals of vibration mechanics with computer vision techniques in order to infermaterial properties from small, often imperceptible motions in video. Objects tend to vibrate in a set of preferred modes. Thefrequencies of these modes depend on the structure and material properties of an object. We show that by extracting these frequenciesfrom video of a vibrating object, we can often make inferences about that object’s material properties. We demonstrate our approach byestimating material properties for a variety of objects by observing their motion in high-speed and regular frame rate video.
Tasks Scene Understanding
Published 2017-04-15
URL http://visualvibrometry.com/
PDF http://www.visualvibrometry.com/publications/visvib_pami.pdf
PWC https://paperswithcode.com/paper/visual-vibrometry-estimating
Repo
Framework

Multilingual Ontologies for the Representation and Processing of Folktales

Title Multilingual Ontologies for the Representation and Processing of Folktales
Authors Thierry Declerck, Anastasija Aman, Martin Banzer, Dominik Mach{'a}{\v{c}}ek, Lisa Sch{"a}fer, Natalia Skachkova
Abstract We describe work done in the field of folkloristics and consisting in creating ontologies based on well-established studies proposed by {``}classical{''} folklorists. This work is supporting the availability of a huge amount of digital and structured knowledge on folktales to digital humanists. The ontological encoding of past and current motif-indexation and classification systems for folktales was in the first step limited to English language data. This led us to focus on making those newly generated formal knowledge sources available in a few more languages, like German, Russian and Bulgarian. We stress the importance of achieving this multilingual extension of our ontologies at a larger scale, in order for example to support the automated analysis and classification of such narratives in a large variety of languages, as those are getting more and more accessible on the Web. |
Tasks Text Generation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-8103/
PDF http://doi.org/10.26615/978-954-452-046-5_003
PWC https://paperswithcode.com/paper/multilingual-ontologies-for-the
Repo
Framework

Survey: Multiword Expression Processing: A Survey

Title Survey: Multiword Expression Processing: A Survey
Authors Mathieu Constant, G{"u}l{\c{s}}en Eryi{\v{g}}it, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, Amalia Todirascu
Abstract Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by {``}MWE processing,{''} distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives. |
Tasks Machine Translation
Published 2017-12-01
URL https://www.aclweb.org/anthology/J17-4005/
PDF https://www.aclweb.org/anthology/J17-4005
PWC https://paperswithcode.com/paper/survey-multiword-expression-processing-a
Repo
Framework

QUINT: Interpretable Question Answering over Knowledge Bases

Title QUINT: Interpretable Question Answering over Knowledge Bases
Authors Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum
Abstract We present QUINT, a live system for question answering over knowledge bases. QUINT automatically learns role-aligned utterance-query templates from user questions paired with their answers. When QUINT answers a question, it visualizes the complete derivation sequence from the natural language utterance to the final answer. The derivation provides an explanation of how the syntactic structure of the question was used to derive the structure of a SPARQL query, and how the phrases in the question were used to instantiate different parts of the query. When an answer seems unsatisfactory, the derivation provides valuable insights towards reformulating the question.
Tasks Named Entity Recognition, Question Answering, Semantic Parsing
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-2011/
PDF https://www.aclweb.org/anthology/D17-2011
PWC https://paperswithcode.com/paper/quint-interpretable-question-answering-over
Repo
Framework

Mahtab at SemEval-2017 Task 2: Combination of Corpus-based and Knowledge-based Methods to Measure Semantic Word Similarity

Title Mahtab at SemEval-2017 Task 2: Combination of Corpus-based and Knowledge-based Methods to Measure Semantic Word Similarity
Authors Niloofar Ranjbar, Fatemeh Mashhadirajab, Mehrnoush Shamsfard, Rayeheh Hosseini pour, Aryan Vahid pour
Abstract In this paper, we describe our proposed method for measuring semantic similarity for a given pair of words at SemEval-2017 monolingual semantic word similarity task. We use a combination of knowledge-based and corpus-based techniques. We use FarsNet, the Persian Word Net, besides deep learning techniques to extract the similarity of words. We evaluated our proposed approach on Persian (Farsi) test data at SemEval-2017. It outperformed the other participants and ranked the first in the challenge.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2040/
PDF https://www.aclweb.org/anthology/S17-2040
PWC https://paperswithcode.com/paper/mahtab-at-semeval-2017-task-2-combination-of
Repo
Framework

Sinhala Word Joiner

Title Sinhala Word Joiner
Authors Rajith Priyanga, Surangika Ranatunga, Gihan Dias
Abstract
Tasks
Published 2017-12-01
URL https://www.aclweb.org/anthology/W17-7528/
PDF https://www.aclweb.org/anthology/W17-7528
PWC https://paperswithcode.com/paper/sinhala-word-joiner
Repo
Framework

A Dataset for Sanskrit Word Segmentation

Title A Dataset for Sanskrit Word Segmentation
Authors Amrith Krishna, Pavan Kumar Satuluri, Pawan Goyal
Abstract The last decade saw a surge in digitisation efforts for ancient manuscripts in Sanskrit. Due to various linguistic peculiarities inherent to the language, even the preliminary tasks such as word segmentation are non-trivial in Sanskrit. Elegant models for Word Segmentation in Sanskrit are indispensable for further syntactic and semantic processing of the manuscripts. Current works in word segmentation for Sanskrit, though commendable in their novelty, often have variations in their objective and evaluation criteria. In this work, we set the record straight. We formally define the objectives and the requirements for the word segmentation task. In order to encourage research in the field and to alleviate the time and effort required in pre-processing, we release a dataset of 115,000 sentences for word segmentation. For each sentence in the dataset we include the input character sequence, ground truth segmentation, and additionally lexical and morphological information about all the phonetically possible segments for the given sentence. In this work, we also discuss the linguistic considerations made while generating the candidate space of the possible segments.
Tasks Transfer Learning
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2214/
PDF https://www.aclweb.org/anthology/W17-2214
PWC https://paperswithcode.com/paper/a-dataset-for-sanskrit-word-segmentation
Repo
Framework

Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding

Title Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding
Authors Joseph Sanu, Mingbin Xu, Hui Jiang, Quan Liu
Abstract In this paper, we propose to learn word embeddings based on the recent fixed-size ordinally forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence into a fixed-size representation. We use FOFE to fully encode the left and right context of each word in a corpus to construct a novel word-context matrix, which is further weighted and factorized using truncated SVD to generate low-dimension word embedding vectors. We evaluate this alternate method in encoding word-context statistics and show the new FOFE method has a notable effect on the resulting word embeddings. Experimental results on several popular word similarity tasks have demonstrated that the proposed method outperforms other SVD models that use canonical count based techniques to generate word context matrices.
Tasks Language Modelling, Semantic Textual Similarity, Word Embeddings
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1031/
PDF https://www.aclweb.org/anthology/D17-1031
PWC https://paperswithcode.com/paper/word-embeddings-based-on-fixed-size-ordinally
Repo
Framework

Classifying Illegal Activities on Tor Network Based on Web Textual Contents

Title Classifying Illegal Activities on Tor Network Based on Web Textual Contents
Authors Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, Ivan de Paz
Abstract The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available a new dataset for Darknet active domains, which we call {''}Darknet Usage Text Addresses{''} (DUTA). We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. We found that the combination of TFIDF words representation with Logistic Regression classifier achieves 96.6{%} of 10 folds cross-validation accuracy and a macro F1 score of 93.7{%} when classifying a subset of illegal activities from DUTA. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1004/
PDF https://www.aclweb.org/anthology/E17-1004
PWC https://paperswithcode.com/paper/classifying-illegal-activities-on-tor-network
Repo
Framework

Proceedings of ACL 2017, System Demonstrations

Title Proceedings of ACL 2017, System Demonstrations
Authors
Abstract
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-4000/
PDF https://www.aclweb.org/anthology/P17-4000
PWC https://paperswithcode.com/paper/proceedings-of-acl-2017-system-demonstrations
Repo
Framework

Identifying Where to Focus in Reading Comprehension for Neural Question Generation

Title Identifying Where to Focus in Reading Comprehension for Neural Question Generation
Authors Xinya Du, Claire Cardie
Abstract A first step in the task of automatically generating questions for testing reading comprehension is to identify \textit{question-worthy} sentences, i.e. sentences in a text passage that humans find it worthwhile to ask questions about. We propose a hierarchical neural sentence-level sequence tagging model for this task, which existing approaches to question generation have ignored. The approach is fully data-driven {—} with no sophisticated NLP pipelines or any hand-crafted rules/features {—} and compares favorably to a number of baselines when evaluated on the SQuAD data set. When incorporated into an existing neural question generation system, the resulting end-to-end system achieves state-of-the-art performance for paragraph-level question generation for reading comprehension.
Tasks Dependency Parsing, Machine Translation, Named Entity Recognition, Question Generation, Reading Comprehension, Sentence Classification, Sentiment Analysis, Text Summarization
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1219/
PDF https://www.aclweb.org/anthology/D17-1219
PWC https://paperswithcode.com/paper/identifying-where-to-focus-in-reading
Repo
Framework

On Frank-Wolfe and Equilibrium Computation

Title On Frank-Wolfe and Equilibrium Computation
Authors Jacob D. Abernethy, Jun-Kun Wang
Abstract We consider the Frank-Wolfe (FW) method for constrained convex optimization, and we show that this classical technique can be interpreted from a different perspective: FW emerges as the computation of an equilibrium (saddle point) of a special convex-concave zero sum game. This saddle-point trick relies on the existence of no-regret online learning to both generate a sequence of iterates but also to provide a proof of convergence through vanishing regret. We show that our stated equivalence has several nice properties, as it exhibits a modularity that gives rise to various old and new algorithms. We explore a few such resulting methods, and provide experimental results to demonstrate correctness and efficiency.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/7236-on-frank-wolfe-and-equilibrium-computation
PDF http://papers.nips.cc/paper/7236-on-frank-wolfe-and-equilibrium-computation.pdf
PWC https://paperswithcode.com/paper/on-frank-wolfe-and-equilibrium-computation
Repo
Framework

Large-scale evaluation of dependency-based DSMs: Are they worth the effort?

Title Large-scale evaluation of dependency-based DSMs: Are they worth the effort?
Authors Gabriella Lapesa, Stefan Evert
Abstract This paper presents a large-scale evaluation study of dependency-based distributional semantic models. We evaluate dependency-filtered and dependency-structured DSMs in a number of standard semantic similarity tasks, systematically exploring their parameter space in order to give them a {``}fair shot{''} against window-based models. Our results show that properly tuned window-based DSMs still outperform the dependency-based models in most tasks. There appears to be little need for the language-dependent resources and computational cost associated with syntactic analysis. |
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2063/
PDF https://www.aclweb.org/anthology/E17-2063
PWC https://paperswithcode.com/paper/large-scale-evaluation-of-dependency-based
Repo
Framework
comments powered by Disqus