October 15, 2019

2093 words 10 mins read

Paper Group NANR 207

Paper Group NANR 207

Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus. Level-Set Methods for Finite-Sum Constrained Convex Optimization. Character Level Based Detection of DGA Domain Names. Hybrid Camera Pose Estimation. Predict Responsibly: Increasing Fairness by Learning to Defer. HiNTS: A Tagset for Middle Low German. Part-of-Spe …

Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus

Title Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus
Authors Andargachew Mekonnen Gezmu, Binyam Ephrem Seyoum, Michael Gasser, Andreas N{"u}rnberger
Abstract We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.
Tasks
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-3809/
PDF https://www.aclweb.org/anthology/W18-3809
PWC https://paperswithcode.com/paper/contemporary-amharic-corpus-automatically
Repo
Framework

Level-Set Methods for Finite-Sum Constrained Convex Optimization

Title Level-Set Methods for Finite-Sum Constrained Convex Optimization
Authors Qihang Lin, Runchao Ma, Tianbao Yang
Abstract We consider the constrained optimization where the objective function and the constraints are defined as summation of finitely many loss functions. This model has applications in machine learning such as Neyman-Pearson classification. We consider two level-set methods to solve this class of problems, an existing inexact Newton method and a new feasible level-set method. To update the level parameter towards the optimality, both methods require an oracle that generates upper and lower bounds as well as an affine-minorant of the level function. To construct the desired oracle, we reformulate the level function as the value of a saddle-point problem using the conjugate and perspective of the loss functions. Then a stochastic variance-reduced gradient method with a special Bregman divergence is proposed as the oracle for solving that saddle-point problem. The special divergence ensures the proximal mapping in each iteration can be solved in a closed form. The total complexity of both level-set methods using the proposed oracle are analyzed.
Tasks
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=2122
PDF http://proceedings.mlr.press/v80/lin18c/lin18c.pdf
PWC https://paperswithcode.com/paper/level-set-methods-for-finite-sum-constrained
Repo
Framework

Character Level Based Detection of DGA Domain Names

Title Character Level Based Detection of DGA Domain Names
Authors Bin Yu, Jie Pan, Jiaming Hu, Anderson Nascimento, Martine De Cock
Abstract Recently several different deep learning architectures have been proposed that take a string of characters as the raw input signal and automatically derive features for text classification. Little studies are available that compare the effectiveness of these approaches for character based text classification with each other. In this paper we perform such an empirical comparison for the important cybersecurity problem of DGA detection: classifying domain names as either benign vs. produced by malware (i.e., by a Domain Generation Algorithm). Training and evaluating on a dataset with 2M domain names shows that there is surprisingly little difference between various convolutional neural network (CNN) and recurrent neural network (RNN) based architectures in terms of accuracy, prompting a preference for the simpler architectures, since they are faster to train and less prone to overfitting.
Tasks Text Classification
Published 2018-01-01
URL https://openreview.net/forum?id=BJLmN8xRW
PDF https://openreview.net/pdf?id=BJLmN8xRW
PWC https://paperswithcode.com/paper/character-level-based-detection-of-dga-domain
Repo
Framework

Hybrid Camera Pose Estimation

Title Hybrid Camera Pose Estimation
Authors Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler
Abstract In this paper, we aim to solve the pose estimation problem of calibrated pinhole and generalized cameras w.r.t. a Structure-from-Motion (SfM) model by leveraging both 2D-3D correspondences as well as 2D-2D correspondences. Traditional approaches either focus on the use of 2D-3D matches, known as structure-based pose estimation or solely on 2D-2D matches (structure-less pose estimation). Absolute pose approaches are limited in their performance by the quality of the 3D point triangulations as well as the completeness of the 3D model. Relative pose approaches, on the other hand, while being more accurate, also tend to be far more computationally costly and often return dozens of possible solutions. This work aims to bridge the gap between these two paradigms. We propose a new RANSAC-based approach that automatically chooses the best type of solver to use at each iteration in a data-driven way. The solvers chosen by our RANSAC can range from pure structure-based or structure-less solvers, to any possible combination of hybrid solvers (i.e. using both types of matches) in between. A number of these new hybrid minimal solvers are also presented in this paper. Both synthetic and real data experiments show our approach to be as accurate as structure-less approaches, while staying close to the efficiency of structure-based methods.
Tasks Pose Estimation
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Camposeco_Hybrid_Camera_Pose_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Camposeco_Hybrid_Camera_Pose_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/hybrid-camera-pose-estimation
Repo
Framework

Predict Responsibly: Increasing Fairness by Learning to Defer

Title Predict Responsibly: Increasing Fairness by Learning to Defer
Authors David Madras, Toniann Pitassi, Richard Zemel
Abstract When machine learning models are used for high-stakes decisions, they should predict accurately, fairly, and responsibly. To fulfill these three requirements, a model must be able to output a reject option (i.e. say “``I Don’t Know”) when it is not qualified to make a prediction. In this work, we propose learning to defer, a method by which a model can defer judgment to a downstream decision-maker such as a human user. We show that learning to defer generalizes the rejection learning framework in two ways: by considering the effect of other agents in the decision-making process, and by allowing for optimization of complex objectives. We propose a learning algorithm which accounts for potential biases held by decision-makerslater in a pipeline. Experiments on real-world datasets demonstrate that learning to defer can make a model not only more accurate but also less biased. Even when operated by highly biased users, we show that deferring models can still greatly improve the fairness of the entire pipeline. |
Tasks Decision Making
Published 2018-01-01
URL https://openreview.net/forum?id=SJUX_MWCZ
PDF https://openreview.net/pdf?id=SJUX_MWCZ
PWC https://paperswithcode.com/paper/predict-responsibly-increasing-fairness-by
Repo
Framework

HiNTS: A Tagset for Middle Low German

Title HiNTS: A Tagset for Middle Low German
Authors Fabian Barteld, Sarah Ihden, Katharina Dreessen, Ingrid Schr{"o}der
Abstract
Tasks
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1622/
PDF https://www.aclweb.org/anthology/L18-1622
PWC https://paperswithcode.com/paper/hints-a-tagset-for-middle-low-german
Repo
Framework

Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM

Title Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM
Authors R Alharbi, ah, Walid Magdy, Kareem Darwish, Ahmed AbdelAli, Hamdy Mubarak
Abstract
Tasks Part-Of-Speech Tagging
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1620/
PDF https://www.aclweb.org/anthology/L18-1620
PWC https://paperswithcode.com/paper/part-of-speech-tagging-for-arabic-gulf
Repo
Framework

Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling

Title Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling
Authors Segun Taofeek Aroyehun, Alex Gelbukh, er
Abstract With the advent of the read-write web which facilitates social interactions in online spaces, the rise of anti-social behaviour in online spaces has attracted the attention of researchers. In this paper, we address the challenge of automatically identifying aggression in social media posts. Our team, saroyehun, participated in the English track of the Aggression Detection in Social Media Shared Task. On this task, we investigate the efficacy of deep neural network models of varying complexity. Our results reveal that deep neural network models require more data points to do better than an NBSVM linear baseline based on character n-grams. Our improved deep neural network models were trained on augmented data and pseudo labeled examples. Our LSTM classifier receives a weighted macro-F1 score of 0.6425 to rank first overall on the Facebook subtask of the shared task. On the social media sub-task, our CNN-LSTM model records a weighted macro-F1 score of 0.5920 to place third overall.
Tasks Data Augmentation, Feature Engineering, Hate Speech Detection
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4411/
PDF https://www.aclweb.org/anthology/W18-4411
PWC https://paperswithcode.com/paper/aggression-detection-in-social-media-using
Repo
Framework

Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions

Title Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions
Authors Eric Holgate, Isabel Cachola, Daniel Preo{\c{t}}iuc-Pietro, Junyi Jessy Li
Abstract Vulgar words are employed in language use for several different functions, ranging from expressing aggression to signaling group identity or the informality of the communication. This versatility of usage of a restricted set of words is challenging for downstream applications and has yet to be studied quantitatively or using natural language processing techniques. We introduce a novel data set of 7,800 tweets from users with known demographic traits where all instances of vulgar words are annotated with one of the six categories of vulgar word use. Using this data set, we present the first analysis of the pragmatic aspects of vulgarity and how they relate to social factors. We build a model able to predict the category of a vulgar word based on the immediate context it appears in with 67.4 macro F1 across six classes. Finally, we demonstrate the utility of modeling the type of vulgar word use in context by using this information to achieve state-of-the-art performance in hate speech detection on a benchmark data set.
Tasks Hate Speech Detection
Published 2018-10-01
URL https://www.aclweb.org/anthology/D18-1471/
PDF https://www.aclweb.org/anthology/D18-1471
PWC https://paperswithcode.com/paper/why-swear-analyzing-and-inferring-the
Repo
Framework

Semi-Supervised Clustering for Short Answer Scoring

Title Semi-Supervised Clustering for Short Answer Scoring
Authors Andrea Horbach, Manfred Pinkal
Abstract
Tasks Metric Learning, Natural Language Inference, Reading Comprehension
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1641/
PDF https://www.aclweb.org/anthology/L18-1641
PWC https://paperswithcode.com/paper/semi-supervised-clustering-for-short-answer
Repo
Framework

The Annotated Transformer

Title The Annotated Transformer
Authors Alex Rush, er
Abstract A major goal of open-source NLP is to quickly and accurately reproduce the results of new work, in a manner that the community can easily use and modify. While most papers publish enough detail for replication, it still may be difficult to achieve good results in practice. This paper presents a worked exercise of paper reproduction with the goal of implementing the results of the recent Transformer model. The replication exercise aims at simple code structure that follows closely with the original work, while achieving an efficient usable system.
Tasks
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-2509/
PDF https://www.aclweb.org/anthology/W18-2509
PWC https://paperswithcode.com/paper/the-annotated-transformer
Repo
Framework

The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech

Title The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech
Authors Bettina Braun, Katharina Zahner
Abstract
Tasks
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1645/
PDF https://www.aclweb.org/anthology/L18-1645
PWC https://paperswithcode.com/paper/the-distribution-and-prosodic-realization-of
Repo
Framework

Towards a Testable Notion of Generalization for Generative Adversarial Networks

Title Towards a Testable Notion of Generalization for Generative Adversarial Networks
Authors Robert Cornish, Hongseok Yang, Frank Wood
Abstract We consider the question of how to assess generative adversarial networks, in particular with respect to whether or not they generalise beyond memorising the training data. We propose a simple procedure for assessing generative adversarial network performance based on a principled consideration of what the actual goal of generalisation is. Our approach involves using a test set to estimate the Wasserstein distance between the generative distribution produced by our procedure, and the underlying data distribution. We use this procedure to assess the performance of several modern generative adversarial network architectures. We find that this procedure is sensitive to the choice of ground metric on the underlying data space, and suggest a choice of ground metric that substantially improves performance. We finally suggest that attending to the ground metric used in Wasserstein generative adversarial network training may be fruitful, and outline a concrete pathway towards doing so.
Tasks
Published 2018-01-01
URL https://openreview.net/forum?id=ByuI-mW0W
PDF https://openreview.net/pdf?id=ByuI-mW0W
PWC https://paperswithcode.com/paper/towards-a-testable-notion-of-generalization
Repo
Framework

Evaluating the text quality, human likeness and tailoring component of PASS: A Dutch data-to-text system for soccer

Title Evaluating the text quality, human likeness and tailoring component of PASS: A Dutch data-to-text system for soccer
Authors Chris van der Lee, Bart Verduijn, Emiel Krahmer, S Wubben, er
Abstract We present an evaluation of PASS, a data-to-text system that generates Dutch soccer reports from match statistics which are automatically tailored towards fans of one club or the other. The evaluation in this paper consists of two studies. An intrinsic human-based evaluation of the system{'}s output is described in the first study. In this study it was found that compared to human-written texts, computer-generated texts were rated slightly lower on style-related text components (fluency and clarity) and slightly higher in terms of the correctness of given information. Furthermore, results from the first study showed that tailoring was accurately recognized in most cases, and that participants struggled with correctly identifying whether a text was written by a human or computer. The second study investigated if tailoring affects perceived text quality, for which no results were garnered. This lack of results might be due to negative preconceptions about computer-generated texts which were found in the first study.
Tasks Text Generation
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1082/
PDF https://www.aclweb.org/anthology/C18-1082
PWC https://paperswithcode.com/paper/evaluating-the-text-quality-human-likeness
Repo
Framework

FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German

Title FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German
Authors Leonidas Lefakis, Alan Akbik, Rol Vollgraf,
Abstract
Tasks Image Retrieval
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1070/
PDF https://www.aclweb.org/anthology/L18-1070
PWC https://paperswithcode.com/paper/feidegger-a-multi-modal-corpus-of-fashion
Repo
Framework
comments powered by Disqus