October 15, 2019

2093 words 10 mins read

Paper Group NANR 207

Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus. Level-Set Methods for Finite-Sum Constrained Convex Optimization. Character Level Based Detection of DGA Domain Names. Hybrid Camera Pose Estimation. Predict Responsibly: Increasing Fairness by Learning to Defer. HiNTS: A Tagset for Middle Low German. Part-of-Spe …

Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus


Title	Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus
Authors	Andargachew Mekonnen Gezmu, Binyam Ephrem Seyoum, Michael Gasser, Andreas N{"u}rnberger
Abstract	We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-3809/
PDF	https://www.aclweb.org/anthology/W18-3809
PWC	https://paperswithcode.com/paper/contemporary-amharic-corpus-automatically
Repo
Framework

Level-Set Methods for Finite-Sum Constrained Convex Optimization


Title	Level-Set Methods for Finite-Sum Constrained Convex Optimization
Authors	Qihang Lin, Runchao Ma, Tianbao Yang
Abstract	We consider the constrained optimization where the objective function and the constraints are defined as summation of finitely many loss functions. This model has applications in machine learning such as Neyman-Pearson classification. We consider two level-set methods to solve this class of problems, an existing inexact Newton method and a new feasible level-set method. To update the level parameter towards the optimality, both methods require an oracle that generates upper and lower bounds as well as an affine-minorant of the level function. To construct the desired oracle, we reformulate the level function as the value of a saddle-point problem using the conjugate and perspective of the loss functions. Then a stochastic variance-reduced gradient method with a special Bregman divergence is proposed as the oracle for solving that saddle-point problem. The special divergence ensures the proximal mapping in each iteration can be solved in a closed form. The total complexity of both level-set methods using the proposed oracle are analyzed.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2122
PDF	http://proceedings.mlr.press/v80/lin18c/lin18c.pdf
PWC	https://paperswithcode.com/paper/level-set-methods-for-finite-sum-constrained
Repo
Framework

Character Level Based Detection of DGA Domain Names


Title	Character Level Based Detection of DGA Domain Names
Authors	Bin Yu, Jie Pan, Jiaming Hu, Anderson Nascimento, Martine De Cock
Abstract	Recently several different deep learning architectures have been proposed that take a string of characters as the raw input signal and automatically derive features for text classification. Little studies are available that compare the effectiveness of these approaches for character based text classification with each other. In this paper we perform such an empirical comparison for the important cybersecurity problem of DGA detection: classifying domain names as either benign vs. produced by malware (i.e., by a Domain Generation Algorithm). Training and evaluating on a dataset with 2M domain names shows that there is surprisingly little difference between various convolutional neural network (CNN) and recurrent neural network (RNN) based architectures in terms of accuracy, prompting a preference for the simpler architectures, since they are faster to train and less prone to overfitting.
Tasks	Text Classification
Published	2018-01-01
URL	https://openreview.net/forum?id=BJLmN8xRW
PDF	https://openreview.net/pdf?id=BJLmN8xRW
PWC	https://paperswithcode.com/paper/character-level-based-detection-of-dga-domain
Repo
Framework

Hybrid Camera Pose Estimation


Title	Hybrid Camera Pose Estimation
Authors	Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler
Abstract	In this paper, we aim to solve the pose estimation problem of calibrated pinhole and generalized cameras w.r.t. a Structure-from-Motion (SfM) model by leveraging both 2D-3D correspondences as well as 2D-2D correspondences. Traditional approaches either focus on the use of 2D-3D matches, known as structure-based pose estimation or solely on 2D-2D matches (structure-less pose estimation). Absolute pose approaches are limited in their performance by the quality of the 3D point triangulations as well as the completeness of the 3D model. Relative pose approaches, on the other hand, while being more accurate, also tend to be far more computationally costly and often return dozens of possible solutions. This work aims to bridge the gap between these two paradigms. We propose a new RANSAC-based approach that automatically chooses the best type of solver to use at each iteration in a data-driven way. The solvers chosen by our RANSAC can range from pure structure-based or structure-less solvers, to any possible combination of hybrid solvers (i.e. using both types of matches) in between. A number of these new hybrid minimal solvers are also presented in this paper. Both synthetic and real data experiments show our approach to be as accurate as structure-less approaches, while staying close to the efficiency of structure-based methods.
Tasks	Pose Estimation
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Camposeco_Hybrid_Camera_Pose_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Camposeco_Hybrid_Camera_Pose_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/hybrid-camera-pose-estimation
Repo
Framework

Predict Responsibly: Increasing Fairness by Learning to Defer


Title	Predict Responsibly: Increasing Fairness by Learning to Defer
Authors	David Madras, Toniann Pitassi, Richard Zemel
Abstract	When machine learning models are used for high-stakes decisions, they should predict accurately, fairly, and responsibly. To fulfill these three requirements, a model must be able to output a reject option (i.e. say “``I Don’t Know”) when it is not qualified to make a prediction. In this work, we propose learning to defer, a method by which a model can defer judgment to a downstream decision-maker such as a human user. We show that learning to defer generalizes the rejection learning framework in two ways: by considering the effect of other agents in the decision-making process, and by allowing for optimization of complex objectives. We propose a learning algorithm which accounts for potential biases held by decision-makerslater in a pipeline. Experiments on real-world datasets demonstrate that learning to defer can make a model not only more accurate but also less biased. Even when operated by highly biased users, we show that deferring models can still greatly improve the fairness of the entire pipeline. \|
Tasks	Decision Making
Published	2018-01-01
URL	https://openreview.net/forum?id=SJUX_MWCZ
PDF	https://openreview.net/pdf?id=SJUX_MWCZ
PWC	https://paperswithcode.com/paper/predict-responsibly-increasing-fairness-by
Repo
Framework

HiNTS: A Tagset for Middle Low German


Title	HiNTS: A Tagset for Middle Low German
Authors	Fabian Barteld, Sarah Ihden, Katharina Dreessen, Ingrid Schr{"o}der
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1622/
PDF	https://www.aclweb.org/anthology/L18-1622
PWC	https://paperswithcode.com/paper/hints-a-tagset-for-middle-low-german
Repo
Framework

Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM


Title	Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM
Authors	R Alharbi, ah, Walid Magdy, Kareem Darwish, Ahmed AbdelAli, Hamdy Mubarak
Abstract
Tasks	Part-Of-Speech Tagging
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1620/
PDF	https://www.aclweb.org/anthology/L18-1620
PWC	https://paperswithcode.com/paper/part-of-speech-tagging-for-arabic-gulf
Repo
Framework


Title	Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling
Authors	Segun Taofeek Aroyehun, Alex Gelbukh, er
Abstract	With the advent of the read-write web which facilitates social interactions in online spaces, the rise of anti-social behaviour in online spaces has attracted the attention of researchers. In this paper, we address the challenge of automatically identifying aggression in social media posts. Our team, saroyehun, participated in the English track of the Aggression Detection in Social Media Shared Task. On this task, we investigate the efficacy of deep neural network models of varying complexity. Our results reveal that deep neural network models require more data points to do better than an NBSVM linear baseline based on character n-grams. Our improved deep neural network models were trained on augmented data and pseudo labeled examples. Our LSTM classifier receives a weighted macro-F1 score of 0.6425 to rank first overall on the Facebook subtask of the shared task. On the social media sub-task, our CNN-LSTM model records a weighted macro-F1 score of 0.5920 to place third overall.
Tasks	Data Augmentation, Feature Engineering, Hate Speech Detection
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-4411/
PDF	https://www.aclweb.org/anthology/W18-4411
PWC	https://paperswithcode.com/paper/aggression-detection-in-social-media-using
Repo
Framework

Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions


Title	Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions
Authors	Eric Holgate, Isabel Cachola, Daniel Preo{\c{t}}iuc-Pietro, Junyi Jessy Li
Abstract	Vulgar words are employed in language use for several different functions, ranging from expressing aggression to signaling group identity or the informality of the communication. This versatility of usage of a restricted set of words is challenging for downstream applications and has yet to be studied quantitatively or using natural language processing techniques. We introduce a novel data set of 7,800 tweets from users with known demographic traits where all instances of vulgar words are annotated with one of the six categories of vulgar word use. Using this data set, we present the first analysis of the pragmatic aspects of vulgarity and how they relate to social factors. We build a model able to predict the category of a vulgar word based on the immediate context it appears in with 67.4 macro F1 across six classes. Finally, we demonstrate the utility of modeling the type of vulgar word use in context by using this information to achieve state-of-the-art performance in hate speech detection on a benchmark data set.
Tasks	Hate Speech Detection
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1471/
PDF	https://www.aclweb.org/anthology/D18-1471
PWC	https://paperswithcode.com/paper/why-swear-analyzing-and-inferring-the
Repo
Framework

Semi-Supervised Clustering for Short Answer Scoring


Title	Semi-Supervised Clustering for Short Answer Scoring
Authors	Andrea Horbach, Manfred Pinkal
Abstract
Tasks	Metric Learning, Natural Language Inference, Reading Comprehension
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1641/
PDF	https://www.aclweb.org/anthology/L18-1641
PWC	https://paperswithcode.com/paper/semi-supervised-clustering-for-short-answer
Repo
Framework

The Annotated Transformer


Title	The Annotated Transformer
Authors	Alex Rush, er
Abstract	A major goal of open-source NLP is to quickly and accurately reproduce the results of new work, in a manner that the community can easily use and modify. While most papers publish enough detail for replication, it still may be difficult to achieve good results in practice. This paper presents a worked exercise of paper reproduction with the goal of implementing the results of the recent Transformer model. The replication exercise aims at simple code structure that follows closely with the original work, while achieving an efficient usable system.
Tasks
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-2509/
PDF	https://www.aclweb.org/anthology/W18-2509
PWC	https://paperswithcode.com/paper/the-annotated-transformer
Repo
Framework

The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech


Title	The Distribution and Prosodic Realization of Verb Forms in German Infant-Directed Speech
Authors	Bettina Braun, Katharina Zahner
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1645/
PDF	https://www.aclweb.org/anthology/L18-1645
PWC	https://paperswithcode.com/paper/the-distribution-and-prosodic-realization-of
Repo
Framework

Towards a Testable Notion of Generalization for Generative Adversarial Networks


Title	Towards a Testable Notion of Generalization for Generative Adversarial Networks
Authors	Robert Cornish, Hongseok Yang, Frank Wood
Abstract	We consider the question of how to assess generative adversarial networks, in particular with respect to whether or not they generalise beyond memorising the training data. We propose a simple procedure for assessing generative adversarial network performance based on a principled consideration of what the actual goal of generalisation is. Our approach involves using a test set to estimate the Wasserstein distance between the generative distribution produced by our procedure, and the underlying data distribution. We use this procedure to assess the performance of several modern generative adversarial network architectures. We find that this procedure is sensitive to the choice of ground metric on the underlying data space, and suggest a choice of ground metric that substantially improves performance. We finally suggest that attending to the ground metric used in Wasserstein generative adversarial network training may be fruitful, and outline a concrete pathway towards doing so.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=ByuI-mW0W
PDF	https://openreview.net/pdf?id=ByuI-mW0W
PWC	https://paperswithcode.com/paper/towards-a-testable-notion-of-generalization
Repo
Framework

Evaluating the text quality, human likeness and tailoring component of PASS: A Dutch data-to-text system for soccer


Title	Evaluating the text quality, human likeness and tailoring component of PASS: A Dutch data-to-text system for soccer
Authors	Chris van der Lee, Bart Verduijn, Emiel Krahmer, S Wubben, er
Abstract	We present an evaluation of PASS, a data-to-text system that generates Dutch soccer reports from match statistics which are automatically tailored towards fans of one club or the other. The evaluation in this paper consists of two studies. An intrinsic human-based evaluation of the system{'}s output is described in the first study. In this study it was found that compared to human-written texts, computer-generated texts were rated slightly lower on style-related text components (fluency and clarity) and slightly higher in terms of the correctness of given information. Furthermore, results from the first study showed that tailoring was accurately recognized in most cases, and that participants struggled with correctly identifying whether a text was written by a human or computer. The second study investigated if tailoring affects perceived text quality, for which no results were garnered. This lack of results might be due to negative preconceptions about computer-generated texts which were found in the first study.
Tasks	Text Generation
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1082/
PDF	https://www.aclweb.org/anthology/C18-1082
PWC	https://paperswithcode.com/paper/evaluating-the-text-quality-human-likeness
Repo
Framework


Title	FEIDEGGER: A Multi-modal Corpus of Fashion Images and Descriptions in German
Authors	Leonidas Lefakis, Alan Akbik, Rol Vollgraf,
Abstract
Tasks	Image Retrieval
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1070/
PDF	https://www.aclweb.org/anthology/L18-1070
PWC	https://paperswithcode.com/paper/feidegger-a-multi-modal-corpus-of-fashion
Repo
Framework