July 26, 2019

2511 words 12 mins read

Paper Group NANR 19

Paper Group NANR 19

K-best Iterative Viterbi Parsing. CAT: Credibility Analysis of Arabic Content on Twitter. The Impact of Figurative Language on Sentiment Analysis. Neural Post-Editing Based on Quality Estimation. Online Automatic Post-editing for MT in a Multi-Domain Translation Environment. Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues …

K-best Iterative Viterbi Parsing

Title K-best Iterative Viterbi Parsing
Authors Katsuhiko Hayashi, Masaaki Nagata
Abstract This paper presents an efficient and optimal parsing algorithm for probabilistic context-free grammars (PCFGs). To achieve faster parsing, our proposal employs a pruning technique to reduce unnecessary edges in the search space. The key is to conduct repetitively Viterbi inside and outside parsing, while gradually expanding the search space to efficiently compute heuristic bounds used for pruning. Our experimental results using the English Penn Treebank corpus show that the proposed algorithm is faster than the standard CKY parsing algorithm. In addition, we also show how to extend this algorithm to extract k-best Viterbi parse trees.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2049/
PDF https://www.aclweb.org/anthology/E17-2049
PWC https://paperswithcode.com/paper/k-best-iterative-viterbi-parsing
Repo
Framework

CAT: Credibility Analysis of Arabic Content on Twitter

Title CAT: Credibility Analysis of Arabic Content on Twitter
Authors Rim El Ballouli, Wassim El-Hajj, Gh, Ahmad our, Shady Elbassuoni, Hazem Hajj, Khaled Shaban
Abstract Data generated on Twitter has become a rich source for various data mining tasks. Those data analysis tasks that are dependent on the tweet semantics, such as sentiment analysis, emotion mining, and rumor detection among others, suffer considerably if the tweet is not credible, not real, or spam. In this paper, we perform an extensive analysis on credibility of Arabic content on Twitter. We also build a classification model (CAT) to automatically predict the credibility of a given Arabic tweet. Of particular originality is the inclusion of features extracted directly or indirectly from the author{'}s profile and timeline. To train and test CAT, we annotated for credibility a data set of 9,000 Arabic tweets that are topic independent. CAT achieved consistent improvements in predicting the credibility of the tweets when compared to several baselines and when compared to the state-of-the-art approach with an improvement of 21{%} in weighted average F-measure. We also conducted experiments to highlight the importance of the user-based features as opposed to the content-based features. We conclude our work with a feature reduction experiment that highlights the best indicative features of credibility.
Tasks Emotion Recognition, Opinion Mining, Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1308/
PDF https://www.aclweb.org/anthology/W17-1308
PWC https://paperswithcode.com/paper/cat-credibility-analysis-of-arabic-content-on
Repo
Framework

The Impact of Figurative Language on Sentiment Analysis

Title The Impact of Figurative Language on Sentiment Analysis
Authors Tom{'a}{\v{s}} Hercig, Ladislav Lenc
Abstract Figurative language such as irony, sarcasm, and metaphor is considered a significant challenge in sentiment analysis. These figurative devices can sculpt the affect of an utterance and test the limits of sentiment analysis of supposedly literal texts. We explore the effect of figurative language on sentiment analysis. We incorporate the figurative language indicators into the sentiment analysis process and compare the results with and without the additional information about them. We evaluate on the SemEval-2015 Task 11 data and outperform the first team with our convolutional neural network model and additional training data in terms of mean squared error and we follow closely behind the first place in terms of cosine similarity.
Tasks Sarcasm Detection, Sentiment Analysis
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1041/
PDF https://doi.org/10.26615/978-954-452-049-6_041
PWC https://paperswithcode.com/paper/the-impact-of-figurative-language-on
Repo
Framework

Neural Post-Editing Based on Quality Estimation

Title Neural Post-Editing Based on Quality Estimation
Authors Yiming Tan, Zhiming Chen, Liu Huang, Lilin Zhang, Maoxi Li, Mingwen Wang
Abstract
Tasks Automatic Post-Editing, Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4776/
PDF https://www.aclweb.org/anthology/W17-4776
PWC https://paperswithcode.com/paper/neural-post-editing-based-on-quality
Repo
Framework

Online Automatic Post-editing for MT in a Multi-Domain Translation Environment

Title Online Automatic Post-editing for MT in a Multi-Domain Translation Environment
Authors Rajen Chatterjee, Gebremedhen Gebremelak, Matteo Negri, Marco Turchi
Abstract Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent errors made by the MT decoder by learning from correction examples. In controlled evaluation scenarios, the representativeness of the training set with respect to the test data is a key factor to achieve good performance. Real-life scenarios, however, do not guarantee such favorable learning conditions. Ideally, to be integrated in a real professional translation workflow (e.g. to play a role in computer-assisted translation framework), APE tools should be flexible enough to cope with continuous streams of diverse data coming from different domains/genres. To cope with this problem, we propose an online APE framework that is: i) robust to data diversity (i.e. capable to learn and apply correction rules in the right contexts) and ii) able to evolve over time (by continuously extending and refining its knowledge). In a comparative evaluation, with English-German test data coming in random order from two different domains, we show the effectiveness of our approach, which outperforms a strong batch system and the state of the art in online APE.
Tasks Automatic Post-Editing, Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1050/
PDF https://www.aclweb.org/anthology/E17-1050
PWC https://paperswithcode.com/paper/online-automatic-post-editing-for-mt-in-a
Repo
Framework

Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues

Title Consistent Translation of Repeated Nouns using Syntactic and Semantic Cues
Authors Xiao Pu, Laura Mascarell, Andrei Popescu-Belis
Abstract We propose a method to decide whether two occurrences of the same noun in a source text should be translated consistently, i.e. using the same noun in the target text as well. We train and test classifiers that predict consistent translations based on lexical, syntactic, and semantic features. We first evaluate the accuracy of our classifiers intrinsically, in terms of the accuracy of consistency predictions, over a subset of the UN Corpus. Then, we also evaluate them in combination with phrase-based statistical MT systems for Chinese-to-English and German-to-English. We compare the automatic post-editing of noun translations with the re-ranking of the translation hypotheses based on the classifiers{'} output, and also use these methods in combination. This improves over the baseline and closes up to 50{%} of the gap in BLEU scores between the baseline and an oracle classifier.
Tasks Automatic Post-Editing, Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1089/
PDF https://www.aclweb.org/anthology/E17-1089
PWC https://paperswithcode.com/paper/consistent-translation-of-repeated-nouns
Repo
Framework

Attention Modeling for Targeted Sentiment

Title Attention Modeling for Targeted Sentiment
Authors Jiangming Liu, Yue Zhang
Abstract Neural network models have been used for target-dependent sentiment analysis. Previous work focus on learning a target specific representation for a given input sentence which is used for classification. However, they do not explicitly model the contribution of each word in a sentence with respect to targeted sentiment polarities. We investigate an attention model to this end. In particular, a vanilla LSTM model is used to induce an attention value of the whole sentence. The model is further extended to differentiate left and right contexts given a certain target following previous work. Results show that by using attention to model the contribution of each word with respect to the target, our model gives significantly improved results over two standard benchmarks. We report the best accuracy for this task.
Tasks Sentiment Analysis, Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2091/
PDF https://www.aclweb.org/anthology/E17-2091
PWC https://paperswithcode.com/paper/attention-modeling-for-targeted-sentiment
Repo
Framework

On a Chat Bot Finding Answers with Optimal Rhetoric Representation

Title On a Chat Bot Finding Answers with Optimal Rhetoric Representation
Authors Boris Galitsky, Dmitry Ilvovsky
Abstract We demo a chat bot with the focus on complex, multi-sentence questions that enforce what we call rhetoric agreement of answers with questions. Chat bot finds answers which are not only relevant by topic but also match the question by style, argumentation patterns, communication means, experience level and other attributes. The system achieves rhetoric agreement by learning pairs of discourse trees (DTs) for question (Q) and answer (A). We build a library of best answer DTs for most types of complex questions. To better recognize a valid rhetoric agreement between Q and A, DTs are extended with the labels for communicative actions. An algorithm for finding the best DT for an A, given a Q, is evaluated.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1035/
PDF https://doi.org/10.26615/978-954-452-049-6_035
PWC https://paperswithcode.com/paper/on-a-chat-bot-finding-answers-with-optimal
Repo
Framework

An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems

Title An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems
Authors Hany Ahmed, Mohamed Elaraby, Abdullah M. Mousa, Mostafa Elhosiny, Sherif Abdou, Mohsen Rashwan
Abstract In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network(SOM).The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and then SOM is used to group the utterances to different speakers. In our experiments, we compared our technique with Normalized Cross Likelihood ratio Clustering (NCLR). Results show that the proposed technique reduces the speaker error rate in comparison with NCLR. Finally, we have experimented the effect of speaker clustering on Speaker Adaptive Training (SAT) in a speech recognition system implemented to test the performance of the proposed technique. It was noted that the proposed technique reduced the WER over clustering speakers with NCLR.
Tasks Large Vocabulary Continuous Speech Recognition, Speaker Identification, Speech Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1310/
PDF https://www.aclweb.org/anthology/W17-1310
PWC https://paperswithcode.com/paper/an-unsupervised-speaker-clustering-technique
Repo
Framework

Solid Harmonic Wavelet Scattering: Predicting Quantum Molecular Energy from Invariant Descriptors of 3D Electronic Densities

Title Solid Harmonic Wavelet Scattering: Predicting Quantum Molecular Energy from Invariant Descriptors of 3D Electronic Densities
Authors Michael Eickenberg, Georgios Exarchakis, Matthew Hirn, Stephane Mallat
Abstract We introduce a solid harmonic wavelet scattering representation, invariant to rigid motion and stable to deformations, for regression and classification of 2D and 3D signals. Solid harmonic wavelets are computed by multiplying solid harmonic functions with Gaussian windows dilated at different scales. Invariant scattering coefficients are obtained by cascading such wavelet transforms with the complex modulus nonlinearity. We study an application of solid harmonic scattering invariants to the estimation of quantum molecular energies, which are also invariant to rigid motion and stable with respect to deformations. A multilinear regression over scattering invariants provides close to state of the art results over small and large databases of organic molecules.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/7232-solid-harmonic-wavelet-scattering-predicting-quantum-molecular-energy-from-invariant-descriptors-of-3d-electronic-densities
PDF http://papers.nips.cc/paper/7232-solid-harmonic-wavelet-scattering-predicting-quantum-molecular-energy-from-invariant-descriptors-of-3d-electronic-densities.pdf
PWC https://paperswithcode.com/paper/solid-harmonic-wavelet-scattering-predicting
Repo
Framework

Deep Learning for Punctuation Restoration in Medical Reports

Title Deep Learning for Punctuation Restoration in Medical Reports
Authors Wael Salloum, Greg Finley, Erik Edwards, Mark Miller, David Suendermann-Oeft
Abstract In clinical dictation, speakers try to be as concise as possible to save time, often resulting in utterances without explicit punctuation commands. Since the end product of a dictated report, e.g. an out-patient letter, does require correct orthography, including exact punctuation, the latter need to be restored, preferably by automated means. This paper describes a method for punctuation restoration based on a state-of-the-art stack of NLP and machine learning techniques including B-RNNs with an attention mechanism and late fusion, as well as a feature extraction technique tailored to the processing of medical terminology using a novel vocabulary reduction model. To the best of our knowledge, the resulting performance is superior to that reported in prior art on similar tasks.
Tasks Speech Recognition
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2319/
PDF https://www.aclweb.org/anthology/W17-2319
PWC https://paperswithcode.com/paper/deep-learning-for-punctuation-restoration-in
Repo
Framework

Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources

Title Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources
Authors Joo-Kyung Kim, Young-Bum Kim, Ruhi Sarikaya, Eric Fosler-Lussier
Abstract Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.
Tasks Cross-Lingual Transfer, Language Modelling, Named Entity Recognition, Part-Of-Speech Tagging, Slot Filling, Transfer Learning, Word Embeddings
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1302/
PDF https://www.aclweb.org/anthology/D17-1302
PWC https://paperswithcode.com/paper/cross-lingual-transfer-learning-for-pos
Repo
Framework

An enhanced automatic speech recognition system for Arabic

Title An enhanced automatic speech recognition system for Arabic
Authors Mohamed Amine Menacer, Odile Mella, Dominique Fohr, Denis Jouvet, David Langlois, Kamel Smaili
Abstract Automatic speech recognition for Arabic is a very challenging task. Despite all the classical techniques for Automatic Speech Recognition (ASR), which can be efficiently applied to Arabic speech recognition, it is essential to take into consideration the language specificities to improve the system performance. In this article, we focus on Modern Standard Arabic (MSA) speech recognition. We introduce the challenges related to Arabic language, namely the complex morphology nature of the language and the absence of the short vowels in written text, which leads to several potential vowelization for each graphemes, which is often conflicting. We develop an ASR system for MSA by using Kaldi toolkit. Several acoustic and language models are trained. We obtain a Word Error Rate (WER) of 14.42 for the baseline system and 12.2 relative improvement by rescoring the lattice and by rewriting the output with the right Z hamoza above or below Alif.
Tasks Speech Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1319/
PDF https://www.aclweb.org/anthology/W17-1319
PWC https://paperswithcode.com/paper/an-enhanced-automatic-speech-recognition
Repo
Framework

Coordination Boundary Identification with Similarity and Replaceability

Title Coordination Boundary Identification with Similarity and Replaceability
Authors Hiroki Teranishi, Hiroyuki Shindo, Yuji Matsumoto
Abstract We propose a neural network model for coordination boundary detection. Our method relies on the two common properties - similarity and replaceability in conjuncts - in order to detect both similar pairs of conjuncts and dissimilar pairs of conjuncts. The model improves identification of clause-level coordination using bidirectional RNNs incorporating two properties as features. We show that our model outperforms the existing state-of-the-art methods on the coordination annotated Penn Treebank and Genia corpus without any syntactic information from parsers.
Tasks Boundary Detection
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1027/
PDF https://www.aclweb.org/anthology/I17-1027
PWC https://paperswithcode.com/paper/coordination-boundary-identification-with
Repo
Framework

Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings

Title Literal or idiomatic? Identifying the reading of single occurrences of German multiword expressions using word embeddings
Authors Rafael Ehren
Abstract Non-compositional multiword expressions (MWEs) still pose serious issues for a variety of natural language processing tasks and their ubiquity makes it impossible to get around methods which automatically identify these kind of MWEs. The method presented in this paper was inspired by Sporleder and Li (2009) and is able to discriminate between the literal and non-literal use of an MWE in an unsupervised way. It is based on the assumption that words in a text form cohesive units. If the cohesion of these units is weakened by an expression, it is classified as literal, and otherwise as idiomatic. While Sporleder an Li used \textit{Normalized Google Distance} to modell semantic similarity, the present work examines the use of avariety of different word embeddings.
Tasks Machine Translation, Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-4011/
PDF https://www.aclweb.org/anthology/E17-4011
PWC https://paperswithcode.com/paper/literal-or-idiomatic-identifying-the-reading
Repo
Framework
comments powered by Disqus