July 26, 2019

2416 words 12 mins read

Paper Group NANR 109

Spherical Structured Feature Maps for Kernel Approximation. Event-Related Features in Feedforward Neural Networks Contribute to Identifying Causal Relations in Discourse. Pragmatic descriptions of perceptual stimuli. Resource-Lean Modeling of Coherence in Commonsense Stories. IIT (BHU): System Description for LSDSem’17 Shared Task. Gender and Diale …

Spherical Structured Feature Maps for Kernel Approximation


Title	Spherical Structured Feature Maps for Kernel Approximation
Authors	Yueming Lyu
Abstract	We propose Spherical Structured Feature (SSF) maps to approximate shift and rotation invariant kernels as well as $b^{th}$-order arc-cosine kernels (Cho & Saul, 2009). We construct SSF maps based on the point set on $d-1$ dimensional sphere $\mathbb{S}^{d-1}$. We prove that the inner product of SSF maps are unbiased estimates for above kernels if asymptotically uniformly distributed point set on $\mathbb{S}^{d-1}$ is given. According to (Brauchart & Grabner, 2015), optimizing the discrete Riesz s-energy can generate asymptotically uniformly distributed point set on $\mathbb{S}^{d-1}$. Thus, we propose an efficient coordinate decent method to find a local optimum of the discrete Riesz s-energy for SSF maps construction. Theoretically, SSF maps construction achieves linear space complexity and loglinear time complexity. Empirically, SSF maps achieve superior performance compared with other methods.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=537
PDF	http://proceedings.mlr.press/v70/lyu17a/lyu17a.pdf
PWC	https://paperswithcode.com/paper/spherical-structured-feature-maps-for-kernel
Repo
Framework


Title	Event-Related Features in Feedforward Neural Networks Contribute to Identifying Causal Relations in Discourse
Authors	Edoardo Maria Ponti, Anna Korhonen
Abstract	Causal relations play a key role in information extraction and reasoning. Most of the times, their expression is ambiguous or implicit, i.e. without signals in the text. This makes their identification challenging. We aim to improve their identification by implementing a Feedforward Neural Network with a novel set of features for this task. In particular, these are based on the position of event mentions and the semantics of events and participants. The resulting classifier outperforms strong baselines on two datasets (the Penn Discourse Treebank and the CSTNews corpus) annotated with different schemes and containing examples in two languages, English and Portuguese. This result demonstrates the importance of events for identifying discourse relations.
Tasks	Question Answering, Text Summarization
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-0903/
PDF	https://www.aclweb.org/anthology/W17-0903
PWC	https://paperswithcode.com/paper/event-related-features-in-feedforward-neural
Repo
Framework

Pragmatic descriptions of perceptual stimuli


Title	Pragmatic descriptions of perceptual stimuli
Authors	Emiel van Miltenburg
Abstract	This research proposal discusses pragmatic factors in image description, arguing that current automatic image description systems do not take these factors into account. I present a general model of the human image description process, and propose to study this process using corpus analysis, experiments, and computational modeling. This will lead to a better characterization of human image description behavior, providing a road map for future research in automatic image description, and the automatic description of perceptual stimuli in general.
Tasks	Object Recognition
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-4001/
PDF	https://www.aclweb.org/anthology/E17-4001
PWC	https://paperswithcode.com/paper/pragmatic-descriptions-of-perceptual-stimuli
Repo
Framework

Resource-Lean Modeling of Coherence in Commonsense Stories


Title	Resource-Lean Modeling of Coherence in Commonsense Stories
Authors	Niko Schenk, Christian Chiarcos
Abstract	We present a resource-lean neural recognizer for modeling coherence in commonsense stories. Our lightweight system is inspired by successful attempts to modeling discourse relations and stands out due to its simplicity and easy optimization compared to prior approaches to narrative script learning. We evaluate our approach in the Story Cloze Test demonstrating an absolute improvement in accuracy of 4.7{%} over state-of-the-art implementations.
Tasks	Reading Comprehension, Semantic Role Labeling
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-0910/
PDF	https://www.aclweb.org/anthology/W17-0910
PWC	https://paperswithcode.com/paper/resource-lean-modeling-of-coherence-in
Repo
Framework

IIT (BHU): System Description for LSDSem’17 Shared Task


Title	IIT (BHU): System Description for LSDSem’17 Shared Task
Authors	Pranav Goel, Anil Kumar Singh
Abstract	This paper describes an ensemble system submitted as part of the LSDSem Shared Task 2017 - the Story Cloze Test. The main conclusion from our results is that an approach based on semantic similarity alone may not be enough for this task. We test various approaches and compare them with two ensemble systems. One is based on voting and the other on logistic regression based classifier. Our final system is able to outperform the previous state of the art for the Story Cloze test. Another very interesting observation is the performance of sentiment based approach which works almost as well on its own as our final ensemble system.
Tasks	Common Sense Reasoning, Semantic Similarity, Semantic Textual Similarity
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-0912/
PDF	https://www.aclweb.org/anthology/W17-0912
PWC	https://paperswithcode.com/paper/iit-bhu-system-description-for-lsdsem17
Repo
Framework

Gender and Dialect Bias in YouTube’s Automatic Captions


Title	Gender and Dialect Bias in YouTube’s Automatic Captions
Authors	Rachael Tatman
Abstract	This project evaluates the accuracy of YouTube{'}s automatically-generated captions across two genders and five dialect groups. Speakers{'} dialect and gender was controlled for by using videos uploaded as part of the {``}accent tag challenge{''}, where speakers explicitly identify their language background. The results show robust differences in accuracy across both gender and dialect, with lower accuracy for 1) women and 2) speakers from Scotland. This finding builds on earlier research finding that speaker{'}s sociolinguistic identity may negatively impact their ability to use automatic speech recognition, and demonstrates the need for sociolinguistically-stratified validation of systems. \|
Tasks	Speech Recognition
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1606/
PDF	https://www.aclweb.org/anthology/W17-1606
PWC	https://paperswithcode.com/paper/gender-and-dialect-bias-in-youtubes-automatic
Repo
Framework

Machine Learning Approach to Evaluate MultiLingual Summaries


Title	Machine Learning Approach to Evaluate MultiLingual Summaries
Authors	Samira Ellouze, Maher Jaoua, Lamia Hadrich Belguith
Abstract	The present paper introduces a new MultiLing text summary evaluation method. This method relies on machine learning approach which operates by combining multiple features to build models that predict the human score (overall responsiveness) of a new summary. We have tried several single and {``}ensemble learning{''} classifiers to build the best model. We have experimented our method in summary level evaluation where we evaluate each text summary separately. The correlation between built models and human score is better than the correlation between baselines and manual score. \|
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1007/
PDF	https://www.aclweb.org/anthology/W17-1007
PWC	https://paperswithcode.com/paper/machine-learning-approach-to-evaluate
Repo
Framework

A Twitter Corpus and Benchmark Resources for German Sentiment Analysis


Title	A Twitter Corpus and Benchmark Resources for German Sentiment Analysis
Authors	Mark Cieliebak, Jan Milan Deriu, Dominic Egger, Fatih Uzdilli
Abstract	In this paper we present SB10k, a new corpus for sentiment analysis with approx. 10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art benchmarks for sentiment analysis in German: we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available.
Tasks	Named Entity Recognition, Sentiment Analysis, Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1106/
PDF	https://www.aclweb.org/anthology/W17-1106
PWC	https://paperswithcode.com/paper/a-twitter-corpus-and-benchmark-resources-for
Repo
Framework

Author Profiling at PAN: from Age and Gender Identification to Language Variety Identification (invited talk)


Title	Author Profiling at PAN: from Age and Gender Identification to Language Variety Identification (invited talk)
Authors	Paolo Rosso
Abstract	Author profiling is the study of how language is shared by people, a problem of growing importance in applications dealing with security, in order to understand who could be behind an anonymous threat message, and marketing, where companies may be interested in knowing the demographics of people that in online reviews liked or disliked their products. In this talk we will give an overview of the PAN shared tasks that since 2013 have been organised at CLEF and FIRE evaluation forums, mainly on age and gender identification in social media, although also personality recognition in Twitter as well as in code sources was also addressed. In 2017 the PAN author profiling shared task addresses jointly gender and language variety identification in Twitter where tweets have been annotated with authors{'} gender and their specific variation of their native language: English (Australia, Canada, Great Britain, Ireland, New Zealand, United States), Spanish (Argentina, Chile, Colombia, Mexico, Peru, Spain, Venezuela), Portuguese (Brazil, Portugal), and Arabic (Egypt, Gulf, Levantine, Maghrebi).
Tasks	Sentiment Analysis
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1205/
PDF	https://www.aclweb.org/anthology/W17-1205
PWC	https://paperswithcode.com/paper/author-profiling-at-pan-from-age-and-gender
Repo
Framework

Kurdish Interdialect Machine Translation


Title	Kurdish Interdialect Machine Translation
Authors	Hossein Hassani
Abstract	This research suggests a method for machine translation among two Kurdish dialects. We chose the two widely spoken dialects, Kurmanji and Sorani, which are considered to be mutually unintelligible. Also, despite being spoken by about 30 million people in different countries, Kurdish is among less-resourced languages. The research used bi-dialectal dictionaries and showed that the lack of parallel corpora is not a major obstacle in machine translation between the two dialects. The experiments showed that the machine translated texts are comprehensible to those who do not speak the dialect. The research is the first attempt for inter-dialect machine translation in Kurdish and particularly could help in making online texts in one dialect comprehensible to those who only speak the target dialect. The results showed that the translated texts are in 71{%} and 79{%} cases rated as understandable for Kurmanji and Sorani respectively. They are rated as slightly-understandable in 29{%} cases for Kurmanji and 21{%} for Sorani.
Tasks	Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1208/
PDF	https://www.aclweb.org/anthology/W17-1208
PWC	https://paperswithcode.com/paper/kurdish-interdialect-machine-translation
Repo
Framework


Title	Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media
Authors	Bill Y. Lin, Frank Xu, Zhiyi Luo, Kenny Zhu
Abstract	In this paper, we present our multi-channel neural architecture for recognizing emerging named entity in social media messages, which we applied in the Novel and Emerging Named Entity Recognition shared task at the EMNLP 2017 Workshop on Noisy User-generated Text (W-NUT). We propose a novel approach, which incorporates comprehensive word representations with multi-channel information and Conditional Random Fields (CRF) into a traditional Bidirectional Long Short-Term Memory (BiLSTM) neural network without using any additional hand-craft features such as gazetteers. In comparison with other systems participating in the shared task, our system won the 2nd place.
Tasks	Named Entity Recognition
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4421/
PDF	https://www.aclweb.org/anthology/W17-4421
PWC	https://paperswithcode.com/paper/multi-channel-bilstm-crf-model-for-emerging
Repo
Framework

Discriminating between Similar Languages with Word-level Convolutional Neural Networks


Title	Discriminating between Similar Languages with Word-level Convolutional Neural Networks
Authors	Marcelo Criscuolo, S Alu{'\i}sio, ra Maria
Abstract	Discriminating between Similar Languages (DSL) is a challenging task addressed at the VarDial Workshop series. We report on our participation in the DSL shared task with a two-stage system. In the first stage, character n-grams are used to separate language groups, then specialized classifiers distinguish similar language varieties. We have conducted experiments with three system configurations and submitted one run for each. Our main approach is a word-level convolutional neural network (CNN) that learns task-specific vectors with minimal text preprocessing. We also experiment with multi-layer perceptron (MLP) networks and another hybrid configuration. Our best run achieved an accuracy of 90.76{%}, ranking 8th among 11 participants and getting very close to the system that ranked first (less than 2 points). Even though the CNN model could not achieve the best results, it still makes a viable approach to discriminating between similar languages.
Tasks	Language Identification, Question Answering, Text Classification
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1215/
PDF	https://www.aclweb.org/anthology/W17-1215
PWC	https://paperswithcode.com/paper/discriminating-between-similar-languages-with
Repo
Framework


Title	Cross-lingual dependency parsing for closely related languages - Helsinki’s submission to VarDial 2017
Authors	J{"o}rg Tiedemann
Abstract	This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017. We present work on annotation projection and treebank translation that gave good results for all three target languages in the test set. In particular, Slovak seems to work well with information coming from the Czech treebank, which is in line with related work. The attachment scores for cross-lingual models even surpass the fully supervised models trained on the target language treebank. Croatian is the most difficult language in the test set and the improvements over the baseline are rather modest. Norwegian works best with information coming from Swedish whereas Danish contributes surprisingly little.
Tasks	Dependency Parsing, Machine Translation, Transfer Learning
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1216/
PDF	https://www.aclweb.org/anthology/W17-1216
PWC	https://paperswithcode.com/paper/cross-lingual-dependency-parsing-for-closely-1
Repo
Framework

Discriminating between Similar Languages Using a Combination of Typed and Untyped Character N-grams and Words


Title	Discriminating between Similar Languages Using a Combination of Typed and Untyped Character N-grams and Words
Authors	Helena Gomez, Ilia Markov, Jorge Baptista, Grigori Sidorov, David Pinto
Abstract	This paper presents the cic{_}ualg{'}s system that took part in the Discriminating between Similar Languages (DSL) shared task, held at the VarDial 2017 Workshop. This year{'}s task aims at identifying 14 languages across 6 language groups using a corpus of excerpts of journalistic texts. Two classification approaches were compared: a single-step (all languages) approach and a two-step (language group and then languages within the group) approach. Features exploited include lexical features (unigrams of words) and character n-grams. Besides traditional (untyped) character n-grams, we introduce typed character n-grams in the DSL task. Experiments were carried out with different feature representation methods (binary and raw term frequency), frequency threshold values, and machine-learning algorithms {–} Support Vector Machines (SVM) and Multinomial Naive Bayes (MNB). Our best run in the DSL task achieved 91.46{%} accuracy.
Tasks	Information Retrieval, Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1217/
PDF	https://www.aclweb.org/anthology/W17-1217
PWC	https://paperswithcode.com/paper/discriminating-between-similar-languages-2
Repo
Framework

Learning Feature Engineering for Classification


Title	Learning Feature Engineering for Classification
Authors	Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, Deepak Turaga
Abstract	Feature engineering is the task of improving predictive modelling performance on a dataset by transforming its feature space. Existing approaches to automate this process rely on either transformed feature space exploration through evaluation-guided search, or explicit expansion of datasets with all transformed features followed by feature selection. Such approaches incur high computational costs in runtime and/or memory. We present a novel technique, called Learning Feature Engineering (LFE), for automating feature engineering in classification tasks. LFE is based on learning the effectiveness of applying a transformation (e.g., arithmetic or aggregate operators) on numerical features, from past feature engineering experiences. Given a new dataset, LFE recommends a set of useful transformations to be applied on features without relying on model evaluation or explicit feature expansion and selection. Using a collection of datasets, we train a set of neural networks, which aim at predicting the transformation that impacts classification performance positively. Our empirical results show that LFE outperforms other feature engineering approaches for an overwhelming majority (89%) of the datasets from various sources while incurring a substantially lower computational cost.
Tasks	Automated Feature Engineering, Feature Engineering, Feature Selection
Published	2017-01-01
URL	https://dl.acm.org/citation.cfm?id=3172240
PDF	https://www.ijcai.org/proceedings/2017/0352.pdf
PWC	https://paperswithcode.com/paper/learning-feature-engineering-for
Repo
Framework

Paper Group NANR 109

Spherical Structured Feature Maps for Kernel Approximation

Event-Related Features in Feedforward Neural Networks Contribute to Identifying Causal Relations in Discourse

Pragmatic descriptions of perceptual stimuli

Resource-Lean Modeling of Coherence in Commonsense Stories

IIT (BHU): System Description for LSDSem’17 Shared Task

Gender and Dialect Bias in YouTube’s Automatic Captions

Machine Learning Approach to Evaluate MultiLingual Summaries

A Twitter Corpus and Benchmark Resources for German Sentiment Analysis

Author Profiling at PAN: from Age and Gender Identification to Language Variety Identification (invited talk)

Kurdish Interdialect Machine Translation

Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media

Discriminating between Similar Languages with Word-level Convolutional Neural Networks

Cross-lingual dependency parsing for closely related languages - Helsinki’s submission to VarDial 2017

Discriminating between Similar Languages Using a Combination of Typed and Untyped Character N-grams and Words

Learning Feature Engineering for Classification

Paper Group NANR 120

Paper Group NAWR 14

Paper Group NANR 26