July 26, 2019

2370 words 12 mins read

Paper Group NANR 87

Human and Automated CEFR-based Grading of Short Answers. Subset Selection under Noise. Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video. Multilingual Ontologies for the Representation and Processing of Folktales. Survey: Multiword Expression Processing: A Survey. QUINT: Interpretable Question Answering over Knowledge Bas …

Human and Automated CEFR-based Grading of Short Answers


Title	Human and Automated CEFR-based Grading of Short Answers
Authors	Ana{"\i}s Tack, Thomas Fran{\c{c}}ois, Sophie Roekhaut, C{'e}drick Fairon
Abstract	This paper is concerned with the task of automatically assessing the written proficiency level of non-native (L2) learners of English. Drawing on previous research on automated L2 writing assessment following the Common European Framework of Reference for Languages (CEFR), we investigate the possibilities and difficulties of deriving the CEFR level from short answers to open-ended questions, which has not yet been subjected to numerous studies up to date. The object of our study is twofold: to examine the intricacy involved with both human and automated CEFR-based grading of short answers. On the one hand, we describe the compilation of a learner corpus of short answers graded with CEFR levels by three certified Cambridge examiners. We mainly observe that, although the shortness of the answers is reported as undermining a clear-cut evaluation, the length of the answer does not necessarily correlate with inter-examiner disagreement. On the other hand, we explore the development of a soft-voting system for the automated CEFR-based grading of short answers and draw tentative conclusions about its use in a computer-assisted testing (CAT) setting.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-5018/
PDF	https://www.aclweb.org/anthology/W17-5018
PWC	https://paperswithcode.com/paper/human-and-automated-cefr-based-grading-of
Repo
Framework

Subset Selection under Noise


Title	Subset Selection under Noise
Authors	Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang, Zhi-Hua Zhou
Abstract	The problem of selecting the best $k$-element subset from a universe is involved in many applications. While previous studies assumed a noise-free environment or a noisy monotone submodular objective function, this paper considers a more realistic and general situation where the evaluation of a subset is a noisy monotone function (not necessarily submodular), with both multiplicative and additive noises. To understand the impact of the noise, we firstly show the approximation ratio of the greedy algorithm and POSS, two powerful algorithms for noise-free subset selection, in the noisy environments. We then propose to incorporate a noise-aware strategy into POSS, resulting in the new PONSS algorithm. We prove that PONSS can achieve a better approximation ratio under some assumption such as i.i.d. noise distribution. The empirical results on influence maximization and sparse regression problems show the superior performance of PONSS.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6947-subset-selection-under-noise
PDF	http://papers.nips.cc/paper/6947-subset-selection-under-noise.pdf
PWC	https://paperswithcode.com/paper/subset-selection-under-noise
Repo
Framework

Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video


Title	Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video
Authors	Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, William T. Freeman
Abstract	The estimation of material properties is important for scene understanding, with many applications in vision, robotics, andstructural engineering. This paper connects fundamentals of vibration mechanics with computer vision techniques in order to infermaterial properties from small, often imperceptible motions in video. Objects tend to vibrate in a set of preferred modes. Thefrequencies of these modes depend on the structure and material properties of an object. We show that by extracting these frequenciesfrom video of a vibrating object, we can often make inferences about that object’s material properties. We demonstrate our approach byestimating material properties for a variety of objects by observing their motion in high-speed and regular frame rate video.
Tasks	Scene Understanding
Published	2017-04-15
URL	http://visualvibrometry.com/
PDF	http://www.visualvibrometry.com/publications/visvib_pami.pdf
PWC	https://paperswithcode.com/paper/visual-vibrometry-estimating
Repo
Framework

Multilingual Ontologies for the Representation and Processing of Folktales


Title	Multilingual Ontologies for the Representation and Processing of Folktales
Authors	Thierry Declerck, Anastasija Aman, Martin Banzer, Dominik Mach{'a}{\v{c}}ek, Lisa Sch{"a}fer, Natalia Skachkova
Abstract	We describe work done in the field of folkloristics and consisting in creating ontologies based on well-established studies proposed by {``}classical{''} folklorists. This work is supporting the availability of a huge amount of digital and structured knowledge on folktales to digital humanists. The ontological encoding of past and current motif-indexation and classification systems for folktales was in the first step limited to English language data. This led us to focus on making those newly generated formal knowledge sources available in a few more languages, like German, Russian and Bulgarian. We stress the importance of achieving this multilingual extension of our ontologies at a larger scale, in order for example to support the automated analysis and classification of such narratives in a large variety of languages, as those are getting more and more accessible on the Web. \|
Tasks	Text Generation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-8103/
PDF	http://doi.org/10.26615/978-954-452-046-5_003
PWC	https://paperswithcode.com/paper/multilingual-ontologies-for-the
Repo
Framework

Survey: Multiword Expression Processing: A Survey


Title	Survey: Multiword Expression Processing: A Survey
Authors	Mathieu Constant, G{"u}l{\c{s}}en Eryi{\v{g}}it, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, Amalia Todirascu
Abstract	Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by {``}MWE processing,{''} distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives. \|
Tasks	Machine Translation
Published	2017-12-01
URL	https://www.aclweb.org/anthology/J17-4005/
PDF	https://www.aclweb.org/anthology/J17-4005
PWC	https://paperswithcode.com/paper/survey-multiword-expression-processing-a
Repo
Framework

QUINT: Interpretable Question Answering over Knowledge Bases


Title	QUINT: Interpretable Question Answering over Knowledge Bases
Authors	Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum
Abstract	We present QUINT, a live system for question answering over knowledge bases. QUINT automatically learns role-aligned utterance-query templates from user questions paired with their answers. When QUINT answers a question, it visualizes the complete derivation sequence from the natural language utterance to the final answer. The derivation provides an explanation of how the syntactic structure of the question was used to derive the structure of a SPARQL query, and how the phrases in the question were used to instantiate different parts of the query. When an answer seems unsatisfactory, the derivation provides valuable insights towards reformulating the question.
Tasks	Named Entity Recognition, Question Answering, Semantic Parsing
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-2011/
PDF	https://www.aclweb.org/anthology/D17-2011
PWC	https://paperswithcode.com/paper/quint-interpretable-question-answering-over
Repo
Framework

Mahtab at SemEval-2017 Task 2: Combination of Corpus-based and Knowledge-based Methods to Measure Semantic Word Similarity


Title	Mahtab at SemEval-2017 Task 2: Combination of Corpus-based and Knowledge-based Methods to Measure Semantic Word Similarity
Authors	Niloofar Ranjbar, Fatemeh Mashhadirajab, Mehrnoush Shamsfard, Rayeheh Hosseini pour, Aryan Vahid pour
Abstract	In this paper, we describe our proposed method for measuring semantic similarity for a given pair of words at SemEval-2017 monolingual semantic word similarity task. We use a combination of knowledge-based and corpus-based techniques. We use FarsNet, the Persian Word Net, besides deep learning techniques to extract the similarity of words. We evaluated our proposed approach on Persian (Farsi) test data at SemEval-2017. It outperformed the other participants and ranked the first in the challenge.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2040/
PDF	https://www.aclweb.org/anthology/S17-2040
PWC	https://paperswithcode.com/paper/mahtab-at-semeval-2017-task-2-combination-of
Repo
Framework

Sinhala Word Joiner


Title	Sinhala Word Joiner
Authors	Rajith Priyanga, Surangika Ranatunga, Gihan Dias
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/W17-7528/
PDF	https://www.aclweb.org/anthology/W17-7528
PWC	https://paperswithcode.com/paper/sinhala-word-joiner
Repo
Framework

A Dataset for Sanskrit Word Segmentation


Title	A Dataset for Sanskrit Word Segmentation
Authors	Amrith Krishna, Pavan Kumar Satuluri, Pawan Goyal
Abstract	The last decade saw a surge in digitisation efforts for ancient manuscripts in Sanskrit. Due to various linguistic peculiarities inherent to the language, even the preliminary tasks such as word segmentation are non-trivial in Sanskrit. Elegant models for Word Segmentation in Sanskrit are indispensable for further syntactic and semantic processing of the manuscripts. Current works in word segmentation for Sanskrit, though commendable in their novelty, often have variations in their objective and evaluation criteria. In this work, we set the record straight. We formally define the objectives and the requirements for the word segmentation task. In order to encourage research in the field and to alleviate the time and effort required in pre-processing, we release a dataset of 115,000 sentences for word segmentation. For each sentence in the dataset we include the input character sequence, ground truth segmentation, and additionally lexical and morphological information about all the phonetically possible segments for the given sentence. In this work, we also discuss the linguistic considerations made while generating the candidate space of the possible segments.
Tasks	Transfer Learning
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2214/
PDF	https://www.aclweb.org/anthology/W17-2214
PWC	https://paperswithcode.com/paper/a-dataset-for-sanskrit-word-segmentation
Repo
Framework

Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding


Title	Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding
Authors	Joseph Sanu, Mingbin Xu, Hui Jiang, Quan Liu
Abstract	In this paper, we propose to learn word embeddings based on the recent fixed-size ordinally forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence into a fixed-size representation. We use FOFE to fully encode the left and right context of each word in a corpus to construct a novel word-context matrix, which is further weighted and factorized using truncated SVD to generate low-dimension word embedding vectors. We evaluate this alternate method in encoding word-context statistics and show the new FOFE method has a notable effect on the resulting word embeddings. Experimental results on several popular word similarity tasks have demonstrated that the proposed method outperforms other SVD models that use canonical count based techniques to generate word context matrices.
Tasks	Language Modelling, Semantic Textual Similarity, Word Embeddings
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1031/
PDF	https://www.aclweb.org/anthology/D17-1031
PWC	https://paperswithcode.com/paper/word-embeddings-based-on-fixed-size-ordinally
Repo
Framework

Classifying Illegal Activities on Tor Network Based on Web Textual Contents


Title	Classifying Illegal Activities on Tor Network Based on Web Textual Contents
Authors	Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, Ivan de Paz
Abstract	The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available a new dataset for Darknet active domains, which we call {''}Darknet Usage Text Addresses{''} (DUTA). We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. We found that the combination of TFIDF words representation with Logistic Regression classifier achieves 96.6{%} of 10 folds cross-validation accuracy and a macro F1 score of 93.7{%} when classifying a subset of illegal activities from DUTA. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1004/
PDF	https://www.aclweb.org/anthology/E17-1004
PWC	https://paperswithcode.com/paper/classifying-illegal-activities-on-tor-network
Repo
Framework

Proceedings of ACL 2017, System Demonstrations


Title	Proceedings of ACL 2017, System Demonstrations
Authors
Abstract
Tasks
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-4000/
PDF	https://www.aclweb.org/anthology/P17-4000
PWC	https://paperswithcode.com/paper/proceedings-of-acl-2017-system-demonstrations
Repo
Framework

Identifying Where to Focus in Reading Comprehension for Neural Question Generation


Title	Identifying Where to Focus in Reading Comprehension for Neural Question Generation
Authors	Xinya Du, Claire Cardie
Abstract	A first step in the task of automatically generating questions for testing reading comprehension is to identify \textit{question-worthy} sentences, i.e. sentences in a text passage that humans find it worthwhile to ask questions about. We propose a hierarchical neural sentence-level sequence tagging model for this task, which existing approaches to question generation have ignored. The approach is fully data-driven {—} with no sophisticated NLP pipelines or any hand-crafted rules/features {—} and compares favorably to a number of baselines when evaluated on the SQuAD data set. When incorporated into an existing neural question generation system, the resulting end-to-end system achieves state-of-the-art performance for paragraph-level question generation for reading comprehension.
Tasks	Dependency Parsing, Machine Translation, Named Entity Recognition, Question Generation, Reading Comprehension, Sentence Classification, Sentiment Analysis, Text Summarization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1219/
PDF	https://www.aclweb.org/anthology/D17-1219
PWC	https://paperswithcode.com/paper/identifying-where-to-focus-in-reading
Repo
Framework

On Frank-Wolfe and Equilibrium Computation


Title	On Frank-Wolfe and Equilibrium Computation
Authors	Jacob D. Abernethy, Jun-Kun Wang
Abstract	We consider the Frank-Wolfe (FW) method for constrained convex optimization, and we show that this classical technique can be interpreted from a different perspective: FW emerges as the computation of an equilibrium (saddle point) of a special convex-concave zero sum game. This saddle-point trick relies on the existence of no-regret online learning to both generate a sequence of iterates but also to provide a proof of convergence through vanishing regret. We show that our stated equivalence has several nice properties, as it exhibits a modularity that gives rise to various old and new algorithms. We explore a few such resulting methods, and provide experimental results to demonstrate correctness and efficiency.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/7236-on-frank-wolfe-and-equilibrium-computation
PDF	http://papers.nips.cc/paper/7236-on-frank-wolfe-and-equilibrium-computation.pdf
PWC	https://paperswithcode.com/paper/on-frank-wolfe-and-equilibrium-computation
Repo
Framework

Large-scale evaluation of dependency-based DSMs: Are they worth the effort?


Title	Large-scale evaluation of dependency-based DSMs: Are they worth the effort?
Authors	Gabriella Lapesa, Stefan Evert
Abstract	This paper presents a large-scale evaluation study of dependency-based distributional semantic models. We evaluate dependency-filtered and dependency-structured DSMs in a number of standard semantic similarity tasks, systematically exploring their parameter space in order to give them a {``}fair shot{''} against window-based models. Our results show that properly tuned window-based DSMs still outperform the dependency-based models in most tasks. There appears to be little need for the language-dependent resources and computational cost associated with syntactic analysis. \|
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2063/
PDF	https://www.aclweb.org/anthology/E17-2063
PWC	https://paperswithcode.com/paper/large-scale-evaluation-of-dependency-based
Repo
Framework