Paper Group NANR 19
K-best Iterative Viterbi Parsing. CAT: Credibility Analysis of Arabic Content on Twitter. The Impact of Figurative Language on Sentiment Analysis. Neural Post-Editing Based on Quality Estimation. Online Automatic Post-editing for MT in a Multi-Domain Translation Environment. Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues …
K-best Iterative Viterbi Parsing
Title | K-best Iterative Viterbi Parsing |
Authors | Katsuhiko Hayashi, Masaaki Nagata |
Abstract | This paper presents an efficient and optimal parsing algorithm for probabilistic context-free grammars (PCFGs). To achieve faster parsing, our proposal employs a pruning technique to reduce unnecessary edges in the search space. The key is to conduct repetitively Viterbi inside and outside parsing, while gradually expanding the search space to efficiently compute heuristic bounds used for pruning. Our experimental results using the English Penn Treebank corpus show that the proposed algorithm is faster than the standard CKY parsing algorithm. In addition, we also show how to extend this algorithm to extract k-best Viterbi parse trees. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2049/ |
https://www.aclweb.org/anthology/E17-2049 | |
PWC | https://paperswithcode.com/paper/k-best-iterative-viterbi-parsing |
Repo | |
Framework | |
CAT: Credibility Analysis of Arabic Content on Twitter
Title | CAT: Credibility Analysis of Arabic Content on Twitter |
Authors | Rim El Ballouli, Wassim El-Hajj, Gh, Ahmad our, Shady Elbassuoni, Hazem Hajj, Khaled Shaban |
Abstract | Data generated on Twitter has become a rich source for various data mining tasks. Those data analysis tasks that are dependent on the tweet semantics, such as sentiment analysis, emotion mining, and rumor detection among others, suffer considerably if the tweet is not credible, not real, or spam. In this paper, we perform an extensive analysis on credibility of Arabic content on Twitter. We also build a classification model (CAT) to automatically predict the credibility of a given Arabic tweet. Of particular originality is the inclusion of features extracted directly or indirectly from the author{'}s profile and timeline. To train and test CAT, we annotated for credibility a data set of 9,000 Arabic tweets that are topic independent. CAT achieved consistent improvements in predicting the credibility of the tweets when compared to several baselines and when compared to the state-of-the-art approach with an improvement of 21{%} in weighted average F-measure. We also conducted experiments to highlight the importance of the user-based features as opposed to the content-based features. We conclude our work with a feature reduction experiment that highlights the best indicative features of credibility. |
Tasks | Emotion Recognition, Opinion Mining, Sentiment Analysis |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1308/ |
https://www.aclweb.org/anthology/W17-1308 | |
PWC | https://paperswithcode.com/paper/cat-credibility-analysis-of-arabic-content-on |
Repo | |
Framework | |
The Impact of Figurative Language on Sentiment Analysis
Title | The Impact of Figurative Language on Sentiment Analysis |
Authors | Tom{'a}{\v{s}} Hercig, Ladislav Lenc |
Abstract | Figurative language such as irony, sarcasm, and metaphor is considered a significant challenge in sentiment analysis. These figurative devices can sculpt the affect of an utterance and test the limits of sentiment analysis of supposedly literal texts. We explore the effect of figurative language on sentiment analysis. We incorporate the figurative language indicators into the sentiment analysis process and compare the results with and without the additional information about them. We evaluate on the SemEval-2015 Task 11 data and outperform the first team with our convolutional neural network model and additional training data in terms of mean squared error and we follow closely behind the first place in terms of cosine similarity. |
Tasks | Sarcasm Detection, Sentiment Analysis |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1041/ |
https://doi.org/10.26615/978-954-452-049-6_041 | |
PWC | https://paperswithcode.com/paper/the-impact-of-figurative-language-on |
Repo | |
Framework | |
Neural Post-Editing Based on Quality Estimation
Title | Neural Post-Editing Based on Quality Estimation |
Authors | Yiming Tan, Zhiming Chen, Liu Huang, Lilin Zhang, Maoxi Li, Mingwen Wang |
Abstract | |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4776/ |
https://www.aclweb.org/anthology/W17-4776 | |
PWC | https://paperswithcode.com/paper/neural-post-editing-based-on-quality |
Repo | |
Framework | |
Online Automatic Post-editing for MT in a Multi-Domain Translation Environment
Title | Online Automatic Post-editing for MT in a Multi-Domain Translation Environment |
Authors | Rajen Chatterjee, Gebremedhen Gebremelak, Matteo Negri, Marco Turchi |
Abstract | Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent errors made by the MT decoder by learning from correction examples. In controlled evaluation scenarios, the representativeness of the training set with respect to the test data is a key factor to achieve good performance. Real-life scenarios, however, do not guarantee such favorable learning conditions. Ideally, to be integrated in a real professional translation workflow (e.g. to play a role in computer-assisted translation framework), APE tools should be flexible enough to cope with continuous streams of diverse data coming from different domains/genres. To cope with this problem, we propose an online APE framework that is: i) robust to data diversity (i.e. capable to learn and apply correction rules in the right contexts) and ii) able to evolve over time (by continuously extending and refining its knowledge). In a comparative evaluation, with English-German test data coming in random order from two different domains, we show the effectiveness of our approach, which outperforms a strong batch system and the state of the art in online APE. |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1050/ |
https://www.aclweb.org/anthology/E17-1050 | |
PWC | https://paperswithcode.com/paper/online-automatic-post-editing-for-mt-in-a |
Repo | |
Framework | |
Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues
Title | Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues |
Authors | Xiao Pu, Laura Mascarell, Andrei Popescu-Belis |
Abstract | We propose a method to decide whether two occurrences of the same noun in a source text should be translated consistently, i.e. using the same noun in the target text as well. We train and test classifiers that predict consistent translations based on lexical, syntactic, and semantic features. We first evaluate the accuracy of our classifiers intrinsically, in terms of the accuracy of consistency predictions, over a subset of the UN Corpus. Then, we also evaluate them in combination with phrase-based statistical MT systems for Chinese-to-English and German-to-English. We compare the automatic post-editing of noun translations with the re-ranking of the translation hypotheses based on the classifiers{'} output, and also use these methods in combination. This improves over the baseline and closes up to 50{%} of the gap in BLEU scores between the baseline and an oracle classifier. |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1089/ |
https://www.aclweb.org/anthology/E17-1089 | |
PWC | https://paperswithcode.com/paper/consistent-translation-of-repeated-nouns |
Repo | |
Framework | |
Attention Modeling for Targeted Sentiment
Title | Attention Modeling for Targeted Sentiment |
Authors | Jiangming Liu, Yue Zhang |
Abstract | Neural network models have been used for target-dependent sentiment analysis. Previous work focus on learning a target specific representation for a given input sentence which is used for classification. However, they do not explicitly model the contribution of each word in a sentence with respect to targeted sentiment polarities. We investigate an attention model to this end. In particular, a vanilla LSTM model is used to induce an attention value of the whole sentence. The model is further extended to differentiate left and right contexts given a certain target following previous work. Results show that by using attention to model the contribution of each word with respect to the target, our model gives significantly improved results over two standard benchmarks. We report the best accuracy for this task. |
Tasks | Sentiment Analysis, Word Embeddings |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2091/ |
https://www.aclweb.org/anthology/E17-2091 | |
PWC | https://paperswithcode.com/paper/attention-modeling-for-targeted-sentiment |
Repo | |
Framework | |
On a Chat Bot Finding Answers with Optimal Rhetoric Representation
Title | On a Chat Bot Finding Answers with Optimal Rhetoric Representation |
Authors | Boris Galitsky, Dmitry Ilvovsky |
Abstract | We demo a chat bot with the focus on complex, multi-sentence questions that enforce what we call rhetoric agreement of answers with questions. Chat bot finds answers which are not only relevant by topic but also match the question by style, argumentation patterns, communication means, experience level and other attributes. The system achieves rhetoric agreement by learning pairs of discourse trees (DTs) for question (Q) and answer (A). We build a library of best answer DTs for most types of complex questions. To better recognize a valid rhetoric agreement between Q and A, DTs are extended with the labels for communicative actions. An algorithm for finding the best DT for an A, given a Q, is evaluated. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1035/ |
https://doi.org/10.26615/978-954-452-049-6_035 | |
PWC | https://paperswithcode.com/paper/on-a-chat-bot-finding-answers-with-optimal |
Repo | |
Framework | |
An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems
Title | An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems |
Authors | Hany Ahmed, Mohamed Elaraby, Abdullah M. Mousa, Mostafa Elhosiny, Sherif Abdou, Mohsen Rashwan |
Abstract | In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network(SOM).The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and then SOM is used to group the utterances to different speakers. In our experiments, we compared our technique with Normalized Cross Likelihood ratio Clustering (NCLR). Results show that the proposed technique reduces the speaker error rate in comparison with NCLR. Finally, we have experimented the effect of speaker clustering on Speaker Adaptive Training (SAT) in a speech recognition system implemented to test the performance of the proposed technique. It was noted that the proposed technique reduced the WER over clustering speakers with NCLR. |
Tasks | Large Vocabulary Continuous Speech Recognition, Speaker Identification, Speech Recognition |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1310/ |
https://www.aclweb.org/anthology/W17-1310 | |
PWC | https://paperswithcode.com/paper/an-unsupervised-speaker-clustering-technique |
Repo | |
Framework | |
Solid Harmonic Wavelet Scattering: Predicting Quantum Molecular Energy from Invariant Descriptors of 3D Electronic Densities
Title | Solid Harmonic Wavelet Scattering: Predicting Quantum Molecular Energy from Invariant Descriptors of 3D Electronic Densities |
Authors | Michael Eickenberg, Georgios Exarchakis, Matthew Hirn, Stephane Mallat |
Abstract | We introduce a solid harmonic wavelet scattering representation, invariant to rigid motion and stable to deformations, for regression and classification of 2D and 3D signals. Solid harmonic wavelets are computed by multiplying solid harmonic functions with Gaussian windows dilated at different scales. Invariant scattering coefficients are obtained by cascading such wavelet transforms with the complex modulus nonlinearity. We study an application of solid harmonic scattering invariants to the estimation of quantum molecular energies, which are also invariant to rigid motion and stable with respect to deformations. A multilinear regression over scattering invariants provides close to state of the art results over small and large databases of organic molecules. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7232-solid-harmonic-wavelet-scattering-predicting-quantum-molecular-energy-from-invariant-descriptors-of-3d-electronic-densities |
http://papers.nips.cc/paper/7232-solid-harmonic-wavelet-scattering-predicting-quantum-molecular-energy-from-invariant-descriptors-of-3d-electronic-densities.pdf | |
PWC | https://paperswithcode.com/paper/solid-harmonic-wavelet-scattering-predicting |
Repo | |
Framework | |
Deep Learning for Punctuation Restoration in Medical Reports
Title | Deep Learning for Punctuation Restoration in Medical Reports |
Authors | Wael Salloum, Greg Finley, Erik Edwards, Mark Miller, David Suendermann-Oeft |
Abstract | In clinical dictation, speakers try to be as concise as possible to save time, often resulting in utterances without explicit punctuation commands. Since the end product of a dictated report, e.g. an out-patient letter, does require correct orthography, including exact punctuation, the latter need to be restored, preferably by automated means. This paper describes a method for punctuation restoration based on a state-of-the-art stack of NLP and machine learning techniques including B-RNNs with an attention mechanism and late fusion, as well as a feature extraction technique tailored to the processing of medical terminology using a novel vocabulary reduction model. To the best of our knowledge, the resulting performance is superior to that reported in prior art on similar tasks. |
Tasks | Speech Recognition |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2319/ |
https://www.aclweb.org/anthology/W17-2319 | |
PWC | https://paperswithcode.com/paper/deep-learning-for-punctuation-restoration-in |
Repo | |
Framework | |
Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources
Title | Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources |
Authors | Joo-Kyung Kim, Young-Bum Kim, Ruhi Sarikaya, Eric Fosler-Lussier |
Abstract | Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language. |
Tasks | Cross-Lingual Transfer, Language Modelling, Named Entity Recognition, Part-Of-Speech Tagging, Slot Filling, Transfer Learning, Word Embeddings |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1302/ |
https://www.aclweb.org/anthology/D17-1302 | |
PWC | https://paperswithcode.com/paper/cross-lingual-transfer-learning-for-pos |
Repo | |
Framework | |
An enhanced automatic speech recognition system for Arabic
Title | An enhanced automatic speech recognition system for Arabic |
Authors | Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Denis Jouvet, David Langlois, Kamel Smaili |
Abstract | Automatic speech recognition for Arabic is a very challenging task. Despite all the classical techniques for Automatic Speech Recognition (ASR), which can be efficiently applied to Arabic speech recognition, it is essential to take into consideration the language specificities to improve the system performance. In this article, we focus on Modern Standard Arabic (MSA) speech recognition. We introduce the challenges related to Arabic language, namely the complex morphology nature of the language and the absence of the short vowels in written text, which leads to several potential vowelization for each graphemes, which is often conflicting. We develop an ASR system for MSA by using Kaldi toolkit. Several acoustic and language models are trained. We obtain a Word Error Rate (WER) of 14.42 for the baseline system and 12.2 relative improvement by rescoring the lattice and by rewriting the output with the right Z hamoza above or below Alif. |
Tasks | Speech Recognition |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1319/ |
https://www.aclweb.org/anthology/W17-1319 | |
PWC | https://paperswithcode.com/paper/an-enhanced-automatic-speech-recognition |
Repo | |
Framework | |
Coordination Boundary Identification with Similarity and Replaceability
Title | Coordination Boundary Identification with Similarity and Replaceability |
Authors | Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto |
Abstract | We propose a neural network model for coordination boundary detection. Our method relies on the two common properties - similarity and replaceability in conjuncts - in order to detect both similar pairs of conjuncts and dissimilar pairs of conjuncts. The model improves identification of clause-level coordination using bidirectional RNNs incorporating two properties as features. We show that our model outperforms the existing state-of-the-art methods on the coordination annotated Penn Treebank and Genia corpus without any syntactic information from parsers. |
Tasks | Boundary Detection |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1027/ |
https://www.aclweb.org/anthology/I17-1027 | |
PWC | https://paperswithcode.com/paper/coordination-boundary-identification-with |
Repo | |
Framework | |
Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings
Title | Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings |
Authors | Rafael Ehren |
Abstract | Non-compositional multiword expressions (MWEs) still pose serious issues for a variety of natural language processing tasks and their ubiquity makes it impossible to get around methods which automatically identify these kind of MWEs. The method presented in this paper was inspired by Sporleder and Li (2009) and is able to discriminate between the literal and non-literal use of an MWE in an unsupervised way. It is based on the assumption that words in a text form cohesive units. If the cohesion of these units is weakened by an expression, it is classified as literal, and otherwise as idiomatic. While Sporleder an Li used \textit{Normalized Google Distance} to modell semantic similarity, the present work examines the use of avariety of different word embeddings. |
Tasks | Machine Translation, Semantic Similarity, Semantic Textual Similarity, Word Embeddings |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-4011/ |
https://www.aclweb.org/anthology/E17-4011 | |
PWC | https://paperswithcode.com/paper/literal-or-idiomatic-identifying-the-reading |
Repo | |
Framework | |