Paper Group NANR 87
Human and Automated CEFR-based Grading of Short Answers. Subset Selection under Noise. Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video. Multilingual Ontologies for the Representation and Processing of Folktales. Survey: Multiword Expression Processing: A Survey. QUINT: Interpretable Question Answering over Knowledge Bas …
Human and Automated CEFR-based Grading of Short Answers
Title | Human and Automated CEFR-based Grading of Short Answers |
Authors | Ana{"\i}s Tack, Thomas Fran{\c{c}}ois, Sophie Roekhaut, C{'e}drick Fairon |
Abstract | This paper is concerned with the task of automatically assessing the written proficiency level of non-native (L2) learners of English. Drawing on previous research on automated L2 writing assessment following the Common European Framework of Reference for Languages (CEFR), we investigate the possibilities and difficulties of deriving the CEFR level from short answers to open-ended questions, which has not yet been subjected to numerous studies up to date. The object of our study is twofold: to examine the intricacy involved with both human and automated CEFR-based grading of short answers. On the one hand, we describe the compilation of a learner corpus of short answers graded with CEFR levels by three certified Cambridge examiners. We mainly observe that, although the shortness of the answers is reported as undermining a clear-cut evaluation, the length of the answer does not necessarily correlate with inter-examiner disagreement. On the other hand, we explore the development of a soft-voting system for the automated CEFR-based grading of short answers and draw tentative conclusions about its use in a computer-assisted testing (CAT) setting. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-5018/ |
https://www.aclweb.org/anthology/W17-5018 | |
PWC | https://paperswithcode.com/paper/human-and-automated-cefr-based-grading-of |
Repo | |
Framework | |
Subset Selection under Noise
Title | Subset Selection under Noise |
Authors | Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang, Zhi-Hua Zhou |
Abstract | The problem of selecting the best $k$-element subset from a universe is involved in many applications. While previous studies assumed a noise-free environment or a noisy monotone submodular objective function, this paper considers a more realistic and general situation where the evaluation of a subset is a noisy monotone function (not necessarily submodular), with both multiplicative and additive noises. To understand the impact of the noise, we firstly show the approximation ratio of the greedy algorithm and POSS, two powerful algorithms for noise-free subset selection, in the noisy environments. We then propose to incorporate a noise-aware strategy into POSS, resulting in the new PONSS algorithm. We prove that PONSS can achieve a better approximation ratio under some assumption such as i.i.d. noise distribution. The empirical results on influence maximization and sparse regression problems show the superior performance of PONSS. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6947-subset-selection-under-noise |
http://papers.nips.cc/paper/6947-subset-selection-under-noise.pdf | |
PWC | https://paperswithcode.com/paper/subset-selection-under-noise |
Repo | |
Framework | |
Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video
Title | Visual Vibrometry: Estimating MaterialProperties from Small Motions in Video |
Authors | Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, William T. Freeman |
Abstract | The estimation of material properties is important for scene understanding, with many applications in vision, robotics, andstructural engineering. This paper connects fundamentals of vibration mechanics with computer vision techniques in order to infermaterial properties from small, often imperceptible motions in video. Objects tend to vibrate in a set of preferred modes. Thefrequencies of these modes depend on the structure and material properties of an object. We show that by extracting these frequenciesfrom video of a vibrating object, we can often make inferences about that object’s material properties. We demonstrate our approach byestimating material properties for a variety of objects by observing their motion in high-speed and regular frame rate video. |
Tasks | Scene Understanding |
Published | 2017-04-15 |
URL | http://visualvibrometry.com/ |
http://www.visualvibrometry.com/publications/visvib_pami.pdf | |
PWC | https://paperswithcode.com/paper/visual-vibrometry-estimating |
Repo | |
Framework | |
Multilingual Ontologies for the Representation and Processing of Folktales
Title | Multilingual Ontologies for the Representation and Processing of Folktales |
Authors | Thierry Declerck, Anastasija Aman, Martin Banzer, Dominik Mach{'a}{\v{c}}ek, Lisa Sch{"a}fer, Natalia Skachkova |
Abstract | We describe work done in the field of folkloristics and consisting in creating ontologies based on well-established studies proposed by {``}classical{''} folklorists. This work is supporting the availability of a huge amount of digital and structured knowledge on folktales to digital humanists. The ontological encoding of past and current motif-indexation and classification systems for folktales was in the first step limited to English language data. This led us to focus on making those newly generated formal knowledge sources available in a few more languages, like German, Russian and Bulgarian. We stress the importance of achieving this multilingual extension of our ontologies at a larger scale, in order for example to support the automated analysis and classification of such narratives in a large variety of languages, as those are getting more and more accessible on the Web. | |
Tasks | Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-8103/ |
http://doi.org/10.26615/978-954-452-046-5_003 | |
PWC | https://paperswithcode.com/paper/multilingual-ontologies-for-the |
Repo | |
Framework | |
Survey: Multiword Expression Processing: A Survey
Title | Survey: Multiword Expression Processing: A Survey |
Authors | Mathieu Constant, G{"u}l{\c{s}}en Eryi{\v{g}}it, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner, Amalia Todirascu |
Abstract | Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by {``}MWE processing,{''} distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives. | |
Tasks | Machine Translation |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/J17-4005/ |
https://www.aclweb.org/anthology/J17-4005 | |
PWC | https://paperswithcode.com/paper/survey-multiword-expression-processing-a |
Repo | |
Framework | |
QUINT: Interpretable Question Answering over Knowledge Bases
Title | QUINT: Interpretable Question Answering over Knowledge Bases |
Authors | Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, Gerhard Weikum |
Abstract | We present QUINT, a live system for question answering over knowledge bases. QUINT automatically learns role-aligned utterance-query templates from user questions paired with their answers. When QUINT answers a question, it visualizes the complete derivation sequence from the natural language utterance to the final answer. The derivation provides an explanation of how the syntactic structure of the question was used to derive the structure of a SPARQL query, and how the phrases in the question were used to instantiate different parts of the query. When an answer seems unsatisfactory, the derivation provides valuable insights towards reformulating the question. |
Tasks | Named Entity Recognition, Question Answering, Semantic Parsing |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-2011/ |
https://www.aclweb.org/anthology/D17-2011 | |
PWC | https://paperswithcode.com/paper/quint-interpretable-question-answering-over |
Repo | |
Framework | |
Mahtab at SemEval-2017 Task 2: Combination of Corpus-based and Knowledge-based Methods to Measure Semantic Word Similarity
Title | Mahtab at SemEval-2017 Task 2: Combination of Corpus-based and Knowledge-based Methods to Measure Semantic Word Similarity |
Authors | Niloofar Ranjbar, Fatemeh Mashhadirajab, Mehrnoush Shamsfard, Rayeheh Hosseini pour, Aryan Vahid pour |
Abstract | In this paper, we describe our proposed method for measuring semantic similarity for a given pair of words at SemEval-2017 monolingual semantic word similarity task. We use a combination of knowledge-based and corpus-based techniques. We use FarsNet, the Persian Word Net, besides deep learning techniques to extract the similarity of words. We evaluated our proposed approach on Persian (Farsi) test data at SemEval-2017. It outperformed the other participants and ranked the first in the challenge. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2040/ |
https://www.aclweb.org/anthology/S17-2040 | |
PWC | https://paperswithcode.com/paper/mahtab-at-semeval-2017-task-2-combination-of |
Repo | |
Framework | |
Sinhala Word Joiner
Title | Sinhala Word Joiner |
Authors | Rajith Priyanga, Surangika Ranatunga, Gihan Dias |
Abstract | |
Tasks | |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/W17-7528/ |
https://www.aclweb.org/anthology/W17-7528 | |
PWC | https://paperswithcode.com/paper/sinhala-word-joiner |
Repo | |
Framework | |
A Dataset for Sanskrit Word Segmentation
Title | A Dataset for Sanskrit Word Segmentation |
Authors | Amrith Krishna, Pavan Kumar Satuluri, Pawan Goyal |
Abstract | The last decade saw a surge in digitisation efforts for ancient manuscripts in Sanskrit. Due to various linguistic peculiarities inherent to the language, even the preliminary tasks such as word segmentation are non-trivial in Sanskrit. Elegant models for Word Segmentation in Sanskrit are indispensable for further syntactic and semantic processing of the manuscripts. Current works in word segmentation for Sanskrit, though commendable in their novelty, often have variations in their objective and evaluation criteria. In this work, we set the record straight. We formally define the objectives and the requirements for the word segmentation task. In order to encourage research in the field and to alleviate the time and effort required in pre-processing, we release a dataset of 115,000 sentences for word segmentation. For each sentence in the dataset we include the input character sequence, ground truth segmentation, and additionally lexical and morphological information about all the phonetically possible segments for the given sentence. In this work, we also discuss the linguistic considerations made while generating the candidate space of the possible segments. |
Tasks | Transfer Learning |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2214/ |
https://www.aclweb.org/anthology/W17-2214 | |
PWC | https://paperswithcode.com/paper/a-dataset-for-sanskrit-word-segmentation |
Repo | |
Framework | |
Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding
Title | Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding |
Authors | Joseph Sanu, Mingbin Xu, Hui Jiang, Quan Liu |
Abstract | In this paper, we propose to learn word embeddings based on the recent fixed-size ordinally forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence into a fixed-size representation. We use FOFE to fully encode the left and right context of each word in a corpus to construct a novel word-context matrix, which is further weighted and factorized using truncated SVD to generate low-dimension word embedding vectors. We evaluate this alternate method in encoding word-context statistics and show the new FOFE method has a notable effect on the resulting word embeddings. Experimental results on several popular word similarity tasks have demonstrated that the proposed method outperforms other SVD models that use canonical count based techniques to generate word context matrices. |
Tasks | Language Modelling, Semantic Textual Similarity, Word Embeddings |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1031/ |
https://www.aclweb.org/anthology/D17-1031 | |
PWC | https://paperswithcode.com/paper/word-embeddings-based-on-fixed-size-ordinally |
Repo | |
Framework | |
Classifying Illegal Activities on Tor Network Based on Web Textual Contents
Title | Classifying Illegal Activities on Tor Network Based on Web Textual Contents |
Authors | Mhd Wesam Al Nabki, Eduardo Fidalgo, Enrique Alegre, Ivan de Paz |
Abstract | The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available a new dataset for Darknet active domains, which we call {''}Darknet Usage Text Addresses{''} (DUTA). We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three different supervised classifiers to categorize the Tor hidden services. We also fixed the pipeline elements and identified the aspects that have a critical influence on the classification results. We found that the combination of TFIDF words representation with Logistic Regression classifier achieves 96.6{%} of 10 folds cross-validation accuracy and a macro F1 score of 93.7{%} when classifying a subset of illegal activities from DUTA. The good performance of the classifier might support potential tools to help the authorities in the detection of these activities. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1004/ |
https://www.aclweb.org/anthology/E17-1004 | |
PWC | https://paperswithcode.com/paper/classifying-illegal-activities-on-tor-network |
Repo | |
Framework | |
Proceedings of ACL 2017, System Demonstrations
Title | Proceedings of ACL 2017, System Demonstrations |
Authors | |
Abstract | |
Tasks | |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-4000/ |
https://www.aclweb.org/anthology/P17-4000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-acl-2017-system-demonstrations |
Repo | |
Framework | |
Identifying Where to Focus in Reading Comprehension for Neural Question Generation
Title | Identifying Where to Focus in Reading Comprehension for Neural Question Generation |
Authors | Xinya Du, Claire Cardie |
Abstract | A first step in the task of automatically generating questions for testing reading comprehension is to identify \textit{question-worthy} sentences, i.e. sentences in a text passage that humans find it worthwhile to ask questions about. We propose a hierarchical neural sentence-level sequence tagging model for this task, which existing approaches to question generation have ignored. The approach is fully data-driven {—} with no sophisticated NLP pipelines or any hand-crafted rules/features {—} and compares favorably to a number of baselines when evaluated on the SQuAD data set. When incorporated into an existing neural question generation system, the resulting end-to-end system achieves state-of-the-art performance for paragraph-level question generation for reading comprehension. |
Tasks | Dependency Parsing, Machine Translation, Named Entity Recognition, Question Generation, Reading Comprehension, Sentence Classification, Sentiment Analysis, Text Summarization |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1219/ |
https://www.aclweb.org/anthology/D17-1219 | |
PWC | https://paperswithcode.com/paper/identifying-where-to-focus-in-reading |
Repo | |
Framework | |
On Frank-Wolfe and Equilibrium Computation
Title | On Frank-Wolfe and Equilibrium Computation |
Authors | Jacob D. Abernethy, Jun-Kun Wang |
Abstract | We consider the Frank-Wolfe (FW) method for constrained convex optimization, and we show that this classical technique can be interpreted from a different perspective: FW emerges as the computation of an equilibrium (saddle point) of a special convex-concave zero sum game. This saddle-point trick relies on the existence of no-regret online learning to both generate a sequence of iterates but also to provide a proof of convergence through vanishing regret. We show that our stated equivalence has several nice properties, as it exhibits a modularity that gives rise to various old and new algorithms. We explore a few such resulting methods, and provide experimental results to demonstrate correctness and efficiency. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7236-on-frank-wolfe-and-equilibrium-computation |
http://papers.nips.cc/paper/7236-on-frank-wolfe-and-equilibrium-computation.pdf | |
PWC | https://paperswithcode.com/paper/on-frank-wolfe-and-equilibrium-computation |
Repo | |
Framework | |
Large-scale evaluation of dependency-based DSMs: Are they worth the effort?
Title | Large-scale evaluation of dependency-based DSMs: Are they worth the effort? |
Authors | Gabriella Lapesa, Stefan Evert |
Abstract | This paper presents a large-scale evaluation study of dependency-based distributional semantic models. We evaluate dependency-filtered and dependency-structured DSMs in a number of standard semantic similarity tasks, systematically exploring their parameter space in order to give them a {``}fair shot{''} against window-based models. Our results show that properly tuned window-based DSMs still outperform the dependency-based models in most tasks. There appears to be little need for the language-dependent resources and computational cost associated with syntactic analysis. | |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2063/ |
https://www.aclweb.org/anthology/E17-2063 | |
PWC | https://paperswithcode.com/paper/large-scale-evaluation-of-dependency-based |
Repo | |
Framework | |