July 26, 2019

2115 words 10 mins read

Paper Group NANR 35

Paper Group NANR 35

Tilde MODEL - Multilingual Open Data for EU Languages. SZTE-NLP at SemEval-2017 Task 10: A High Precision Sequence Model for Keyphrase Extraction Utilizing Sparse Coding for Feature Generation. Simple Queries as Distant Labels for Predicting Gender on Twitter. TALERUM - Learning Danish by Doing Danish. NTU-1 at SemEval-2017 Task 12: Detection and c …

Tilde MODEL - Multilingual Open Data for EU Languages

Title Tilde MODEL - Multilingual Open Data for EU Languages
Authors Roberts Rozis, Raivis Skadi{\c{n}}{\v{s}}
Abstract
Tasks Machine Translation
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0235/
PDF https://www.aclweb.org/anthology/W17-0235
PWC https://paperswithcode.com/paper/tilde-model-multilingual-open-data-for-eu
Repo
Framework

SZTE-NLP at SemEval-2017 Task 10: A High Precision Sequence Model for Keyphrase Extraction Utilizing Sparse Coding for Feature Generation

Title SZTE-NLP at SemEval-2017 Task 10: A High Precision Sequence Model for Keyphrase Extraction Utilizing Sparse Coding for Feature Generation
Authors G{'a}bor Berend
Abstract In this paper we introduce our system participating at the 2017 SemEval shared task on keyphrase extraction from scientific documents. We aimed at the creation of a keyphrase extraction approach which relies on as little external resources as possible. Without applying any hand-crafted external resources, and only utilizing a transformed version of word embeddings trained at Wikipedia, our proposed system manages to perform among the best participating systems in terms of precision.
Tasks Named Entity Recognition, Part-Of-Speech Tagging, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2173/
PDF https://www.aclweb.org/anthology/S17-2173
PWC https://paperswithcode.com/paper/szte-nlp-at-semeval-2017-task-10-a-high
Repo
Framework

Simple Queries as Distant Labels for Predicting Gender on Twitter

Title Simple Queries as Distant Labels for Predicting Gender on Twitter
Authors Chris Emmery, Grzegorz Chrupa{\l}a, Walter Daelemans
Abstract The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly supervised methods exist, although these generally rely on knowledge gathered using external sources. This paper demonstrates the effectiveness of gathering distant labels for self-reported gender on Twitter using simple queries. We confirm the reliability of this query heuristic by comparing with manual annotation. Moreover, using these labels for distant supervision, we demonstrate competitive model performance on the same data as models trained on manual annotations. As such, we offer a cheap, extensible, and fast alternative that can be employed beyond the task of gender classification.
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4407/
PDF https://www.aclweb.org/anthology/W17-4407
PWC https://paperswithcode.com/paper/simple-queries-as-distant-labels-for
Repo
Framework

TALERUM - Learning Danish by Doing Danish

Title TALERUM - Learning Danish by Doing Danish
Authors Peter Juel Henrichsen
Abstract
Tasks
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0246/
PDF https://www.aclweb.org/anthology/W17-0246
PWC https://paperswithcode.com/paper/talerum-learning-danish-by-doing-danish
Repo
Framework

NTU-1 at SemEval-2017 Task 12: Detection and classification of temporal events in clinical data with domain adaptation

Title NTU-1 at SemEval-2017 Task 12: Detection and classification of temporal events in clinical data with domain adaptation
Authors Po-Yu Huang, Hen-Hsen Huang, Yu-Wun Wang, Ching Huang, Hsin-Hsi Chen
Abstract This study proposes a system to participate in the Clinical TempEval 2017 shared task, a part of the SemEval 2017 Tasks. Domain adaptation was the main challenge this year. We took part in the supervised domain adaption where data of 591 records of colon cancer patients and 30 records of brain cancer patients from Mayo clinic were given and we are asked to analyze the records from brain cancer patients. Based on the THYME corpus released by the organizer of Clinical TempEval, we propose a framework that automatically analyzes clinical temporal events in a fine-grained level. Support vector machine (SVM) and conditional random field (CRF) were implemented in our system for different subtasks, including detecting clinical relevant events and time expression, determining their attributes, and identifying their relations with each other within the document. The results showed the capability of domain adaptation of our system.
Tasks Domain Adaptation
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2177/
PDF https://www.aclweb.org/anthology/S17-2177
PWC https://paperswithcode.com/paper/ntu-1-at-semeval-2017-task-12-detection-and
Repo
Framework

Simple Compound Splitting for German

Title Simple Compound Splitting for German
Authors Marion Weller-Di Marco
Abstract This paper presents a simple method for German compound splitting that combines a basic frequency-based approach with a form-to-lemma mapping to approximate morphological operations. With the exception of a small set of hand-crafted rules for modeling transitional elements, this approach is resource-poor. In our evaluation, the simple splitter outperforms a splitter relying on rich morphological resources.
Tasks Information Retrieval, Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1722/
PDF https://www.aclweb.org/anthology/W17-1722
PWC https://paperswithcode.com/paper/simple-compound-splitting-for-german
Repo
Framework

Audience Segmentation in Social Media

Title Audience Segmentation in Social Media
Authors Verena Henrich, Alex Lang, er
Abstract Understanding the social media audience is becoming increasingly important for social media analysis. This paper presents an approach that detects various audience attributes, including author location, demographics, behavior and interests. It works both for a variety of social media sources and for multiple languages. The approach has been implemented within IBM Watson Analytics for Social Media and creates author profiles for more than 300 different analysis domains every day.
Tasks Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-3014/
PDF https://www.aclweb.org/anthology/E17-3014
PWC https://paperswithcode.com/paper/audience-segmentation-in-social-media
Repo
Framework

Source-Side Left-to-Right or Target-Side Left-to-Right? An Empirical Comparison of Two Phrase-Based Decoding Algorithms

Title Source-Side Left-to-Right or Target-Side Left-to-Right? An Empirical Comparison of Two Phrase-Based Decoding Algorithms
Authors Yin-Wen Chang, Michael Collins
Abstract This paper describes an empirical study of the phrase-based decoding algorithm proposed by Chang and Collins (2017). The algorithm produces a translation by processing the source-language sentence in strictly left-to-right order, differing from commonly used approaches that build the target-language sentence in left-to-right order. Our results show that the new algorithm is competitive with Moses (Koehn et al., 2007) in terms of both speed and BLEU scores.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1157/
PDF https://www.aclweb.org/anthology/D17-1157
PWC https://paperswithcode.com/paper/source-side-left-to-right-or-target-side-left
Repo
Framework

Solving Most Systems of Random Quadratic Equations

Title Solving Most Systems of Random Quadratic Equations
Authors Gang Wang, Georgios Giannakis, Yousef Saad, Jie Chen
Abstract This paper deals with finding an $n$-dimensional solution $\bm{x}$ to a system of quadratic equations $y_i=\langle\bm{a}_i,\bm{x}\rangle^2$, $1\le i \le m$, which in general is known to be NP-hard. We put forth a novel procedure, that starts with a \emph{weighted maximal correlation initialization} obtainable with a few power iterations, followed by successive refinements based on \emph{iteratively reweighted gradient-type iterations}. The novel techniques distinguish themselves from prior works by the inclusion of a fresh (re)weighting regularization. For certain random measurement models, the proposed procedure returns the true solution $\bm{x}$ with high probability in time proportional to reading the data ${(\bm{a}i;y_i)}{1\le i \le m}$, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns, that is, $m\ge cn$. Empirically, the upshots of this contribution are: i) perfect signal recovery in the high-dimensional regime given only an \emph{information-theoretic limit number} of equations; and, ii) (near-)optimal statistical accuracy in the presence of additive noise. Extensive numerical tests using both synthetic data and real images corroborate its improved signal recovery performance and computational efficiency relative to state-of-the-art approaches.
Tasks
Published 2017-12-01
URL http://papers.nips.cc/paper/6783-solving-most-systems-of-random-quadratic-equations
PDF http://papers.nips.cc/paper/6783-solving-most-systems-of-random-quadratic-equations.pdf
PWC https://paperswithcode.com/paper/solving-most-systems-of-random-quadratic
Repo
Framework

Parsing and MWE Detection: Fips at the PARSEME Shared Task

Title Parsing and MWE Detection: Fips at the PARSEME Shared Task
Authors Luka Nerima, Vasiliki Foufi, {'E}ric Wehrli
Abstract Identifying multiword expressions (MWEs) in a sentence in order to ensure their proper processing in subsequent applications, like machine translation, and performing the syntactic analysis of the sentence are interrelated processes. In our approach, priority is given to parsing alternatives involving collocations, and hence collocational information helps the parser through the maze of alternatives, with the aim to lead to substantial improvements in the performance of both tasks (collocation identification and parsing), and in that of a subsequent task (machine translation). In this paper, we are going to present our system and the procedure that we have followed in order to participate to the open track of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) in running texts.
Tasks Lexical Analysis, Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1706/
PDF https://www.aclweb.org/anthology/W17-1706
PWC https://paperswithcode.com/paper/parsing-and-mwe-detection-fips-at-the-parseme
Repo
Framework

Dissipativity Theory for Nesterov’s Accelerated Method

Title Dissipativity Theory for Nesterov’s Accelerated Method
Authors Bin Hu, Laurent Lessard
Abstract In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov’s accelerated method. Our theory ties rigorous convergence rate analysis to the physically intuitive notion of energy dissipation. Moreover, dissipativity allows one to efficiently construct Lyapunov functions (either numerically or analytically) by solving a small semidefinite program. Using novel supply rate functions, we show how to recover known rate bounds for Nesterov’s method and we generalize the approach to certify both linear and sublinear rates in a variety of settings. Finally, we link the continuous-time version of dissipativity to recent works on algorithm analysis that use discretizations of ordinary differential equations.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=891
PDF http://proceedings.mlr.press/v70/hu17a/hu17a.pdf
PWC https://paperswithcode.com/paper/dissipativity-theory-for-nesterovs
Repo
Framework

Chained Cascade Network for Object Detection

Title Chained Cascade Network for Object Detection
Authors Wanli Ouyang, Kun Wang, Xin Zhu, Xiaogang Wang
Abstract Cascade is a widely used approach that rejects obvious negative samples at early stages for learning better classifier and faster inference. This paper presents chained cascade network (CC-Net). In this CC-Net, there are many cascade stages. Preceding cascade stages are placed at shallow layers. Easy hard examples are rejected at shallow layers so that the computation for deeper or wider layers is not required. In this way, features and classifiers at latter stages handle more difficult samples with the help of features and classifiers in previous stages. It yields consistent boost in detection performance on PASCAL VOC 2007 and ImageNet for both fast RCNN and Faster RCNN. CC-Net saves computation for both training and testing. Code is available on.
Tasks Object Detection
Published 2017-10-01
URL http://openaccess.thecvf.com/content_iccv_2017/html/Ouyang_Chained_Cascade_Network_ICCV_2017_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2017/papers/Ouyang_Chained_Cascade_Network_ICCV_2017_paper.pdf
PWC https://paperswithcode.com/paper/chained-cascade-network-for-object-detection
Repo
Framework

An Extensible Framework for Verification of Numerical Claims

Title An Extensible Framework for Verification of Numerical Claims
Authors James Thorne, Andreas Vlachos
Abstract In this paper we present our automated fact checking system demonstration which we developed in order to participate in the Fast and Furious Fact Check challenge. We focused on simple numerical claims such as {``}population of Germany in 2015 was 80 million{''} which comprised a quarter of the test instances in the challenge, achieving 68{%} accuracy. Our system extends previous work on semantic parsing and claim identification to handle temporal expressions and knowledge bases consisting of multiple tables, while relying solely on automatically generated training data. We demonstrate the extensible nature of our system by evaluating it on relations used in previous work. We make our system publicly available so that it can be used and extended by the community. |
Tasks Rumour Detection, Semantic Parsing
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-3010/
PDF https://www.aclweb.org/anthology/E17-3010
PWC https://paperswithcode.com/paper/an-extensible-framework-for-verification-of
Repo
Framework

Engineering a direct k-way Hypergraph Partitioning Algorithm

Title Engineering a direct k-way Hypergraph Partitioning Algorithm
Authors Yaroslav Akhremtsev, Tobias Heuer, Peter Sanders, Sebastian Schlag
Abstract We develop a fast and high quality multilevel algorithm that directly partitions hypergraphs into k balanced blocks – without the detour over recursive bipartitioning. In particular, our algorithm efficiently implements the powerful FM local search heuristics for the complicated k-way case. This is important for objective functions which depend on the number of blocks connected by a hyperedge. We also remove several further bottlenecks in processing large hyperedges, develop a faster contraction algorithm, and a new adaptive stopping rule for local search. To further reduce the size of hyperedges, we develop a pin-sparsifier based on the min-hashing technique that clusters vertices with similar neighborhood. Extensive experiments indicate that our KaHyPar-partitioner compares favorably with the best previous systems. KaHyPar is faster than hMetis and computes better solutions. KaHyPar’s results are considerably better than the (faster) PaToH partitioner.
Tasks graph partitioning, hypergraph partitioning
Published 2017-01-18
URL https://epubs.siam.org/doi/10.1137/1.9781611974768.3
PDF https://epubs.siam.org/doi/10.1137/1.9781611974768.3
PWC https://paperswithcode.com/paper/engineering-a-direct-k-way-hypergraph
Repo
Framework

Improving Document Clustering by Removing Unnatural Language

Title Improving Document Clustering by Removing Unnatural Language
Authors Myungha Jang, Jinho D. Choi, James Allan
Abstract Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can bean important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of un-natural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various for-mats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that re-moving unnatural language components gives an absolute improvement in document cluster-ing by up to 15{%}. Our corpus and tool are publicly available
Tasks Document Layout Analysis, Optical Character Recognition
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4416/
PDF https://www.aclweb.org/anthology/W17-4416
PWC https://paperswithcode.com/paper/improving-document-clustering-by-removing
Repo
Framework
comments powered by Disqus