Paper Group NANR 233
Large-Scale Hierarchical Alignment for Data-driven Text Rewriting. Simplification-induced transformations: typology and some characteristics. Multi-Class Part Parsing With Joint Boundary-Semantic Awareness. Barrage of Random Transforms for Adversarially Robust Defense. Sentence Length. VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its …
Large-Scale Hierarchical Alignment for Data-driven Text Rewriting
Title | Large-Scale Hierarchical Alignment for Data-driven Text Rewriting |
Authors | Nikola Nikolov, Richard Hahnloser |
Abstract | We propose a simple unsupervised method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different text styles, such as news articles and scientific papers. Our approach does not require a seed parallel corpus, but instead relies solely on hierarchical search over pre-trained embeddings of documents and sentences. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on text simplification from the normal to the Simple Wikipedia. We show that pseudo-parallel sentences extracted with our method not only supplement existing parallel data, but can even lead to competitive performance on their own. |
Tasks | Text Simplification |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1098/ |
https://www.aclweb.org/anthology/R19-1098 | |
PWC | https://paperswithcode.com/paper/large-scale-hierarchical-alignment-for-data |
Repo | |
Framework | |
Simplification-induced transformations: typology and some characteristics
Title | Simplification-induced transformations: typology and some characteristics |
Authors | Ana{"\i}s Koptient, R{'e}mi Cardon, Natalia Grabar |
Abstract | The purpose of automatic text simplification is to transform technical or difficult to understand texts into a more friendly version. The semantics must be preserved during this transformation. Automatic text simplification can be done at different levels (lexical, syntactic, semantic, stylistic…) and relies on the corresponding knowledge and resources (lexicon, rules…). Our objective is to propose methods and material for the creation of transformation rules from a small set of parallel sentences differentiated by their technicity. We also propose a typology of transformations and quantify them. We work with French-language data related to the medical domain, although we assume that the method can be exploited on texts in any language and from any domain. |
Tasks | Text Simplification |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5033/ |
https://www.aclweb.org/anthology/W19-5033 | |
PWC | https://paperswithcode.com/paper/simplification-induced-transformations |
Repo | |
Framework | |
Multi-Class Part Parsing With Joint Boundary-Semantic Awareness
Title | Multi-Class Part Parsing With Joint Boundary-Semantic Awareness |
Authors | Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian |
Abstract | Object part parsing in the wild, which requires to simultaneously detect multiple object classes in the scene and accurately segments semantic parts within each class, is challenging for the joint presence of class-level and part-level ambiguities. Despite its importance, however, this problem is not sufficiently explored in existing works. In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem. To handle part-level ambiguity, a boundary awareness module is proposed to make mid-level features at multiple scales attend to part boundaries for accurate part localization, which are then fused with high-level features for effective part recognition. For class-level ambiguity, we further present a semantic awareness module that selects discriminative part features relevant to a category to prevent irrelevant features being merged together. The proposed modules are lightweight and implementation friendly, improving the performance substantially when plugged into various baseline architectures. Without bells and whistles, the full model sets new state-of-the-art results on the Pascal-Part dataset, in both multi-class and the conventional single-class setting, while running substantially faster than recent high-performance approaches. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Zhao_Multi-Class_Part_Parsing_With_Joint_Boundary-Semantic_Awareness_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhao_Multi-Class_Part_Parsing_With_Joint_Boundary-Semantic_Awareness_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/multi-class-part-parsing-with-joint-boundary |
Repo | |
Framework | |
Barrage of Random Transforms for Adversarially Robust Defense
Title | Barrage of Random Transforms for Adversarially Robust Defense |
Authors | Edward Raff, Jared Sylvester, Steven Forsyth, Mark McLean |
Abstract | Defenses against adversarial examples, when using the ImageNet dataset, are historically easy to defeat. The common understanding is that a combination of simple image transformations and other various defenses are insufficient to provide the necessary protection when the obfuscated gradient is taken into account. In this paper, we explore the idea of stochastically combining a large number of individually weak defenses into a single barrage of randomized transformations to build a strong defense against adversarial attacks. We show that, even after accounting for obfuscated gradients, the Barrage of Random Transforms (BaRT) is a resilient defense against even the most difficult attacks, such as PGD. BaRT achieves up to a 24x improvement in accuracy compared to previous work, and has even extended effectiveness out to a previously untested maximum adversarial perturbation of e=32. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Raff_Barrage_of_Random_Transforms_for_Adversarially_Robust_Defense_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Raff_Barrage_of_Random_Transforms_for_Adversarially_Robust_Defense_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/barrage-of-random-transforms-for |
Repo | |
Framework | |
Sentence Length
Title | Sentence Length |
Authors | G{'a}bor Borb{'e}ly, Andr{'a}s Kornai |
Abstract | |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/W19-5710/ |
https://www.aclweb.org/anthology/W19-5710 | |
PWC | https://paperswithcode.com/paper/sentence-length-1 |
Repo | |
Framework | |
VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling
Title | VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling |
Authors | Andrea Di Fabio, Simone Conia, Roberto Navigli |
Abstract | We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org. |
Tasks | Semantic Role Labeling |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1058/ |
https://www.aclweb.org/anthology/D19-1058 | |
PWC | https://paperswithcode.com/paper/verbatlas-a-novel-large-scale-verbal-semantic |
Repo | |
Framework | |
Accelerated Gravitational Point Set Alignment With Altered Physical Laws
Title | Accelerated Gravitational Point Set Alignment With Altered Physical Laws |
Authors | Vladislav Golyanik, Christian Theobalt, Didier Stricker |
Abstract | This work describes Barnes-Hut Rigid Gravitational Approach (BH-RGA) – a new rigid point set registration method relying on principles of particle dynamics. Interpreting the inputs as two interacting particle swarms, we directly minimise the gravitational potential energy of the system using non-linear least squares. Compared to solutions obtained by solving systems of second-order ordinary differential equations, our approach is more robust and less dependent on the parameter choice. We accelerate otherwise exhaustive particle interactions with a Barnes-Hut tree and efficiently handle massive point sets in quasilinear time while preserving the globally multiply-linked character of interactions. Among the advantages of BH-RGA is the possibility to define boundary conditions or additional alignment cues through varying point masses. Systematic experiments demonstrate that BH-RGA surpasses performances of baseline methods in terms of the convergence basin and accuracy when handling incomplete, noisy and perturbed data. The proposed approach also positively compares to the competing method for the alignment with prior matches. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Golyanik_Accelerated_Gravitational_Point_Set_Alignment_With_Altered_Physical_Laws_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Golyanik_Accelerated_Gravitational_Point_Set_Alignment_With_Altered_Physical_Laws_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/accelerated-gravitational-point-set-alignment |
Repo | |
Framework | |
From Monolingual to Multilingual FAQ Assistant using Multilingual Co-training
Title | From Monolingual to Multilingual FAQ Assistant using Multilingual Co-training |
Authors | Mayur Patidar, Surabhi Kumari, Manasi Patwardhan, Kar, Shirish e, Puneet Agarwal, Lovekesh Vig, Gautam Shroff |
Abstract | Recent research on cross-lingual transfer show state-of-the-art results on benchmark datasets using pre-trained language representation models (PLRM) like BERT. These results are achieved with the traditional training approaches, such as Zero-shot with no data, Translate-train or Translate-test with machine translated data. In this work, we propose an approach of {``}Multilingual Co-training{''} (MCT) where we augment the expert annotated dataset in the source language (English) with the corresponding machine translations in the target languages (e.g. Arabic, Spanish) and fine-tune the PLRM jointly. We observe that the proposed approach provides consistent gains in the performance of BERT for multiple benchmark datasets (e.g. 1.0{%} gain on MLDocs, and 1.2{%} gain on XNLI over translate-train with BERT), while requiring a single model for multiple languages. We further consider a FAQ dataset where the available English test dataset is translated by experts into Arabic and Spanish. On such a dataset, we observe an average gain of 4.9{%} over all other cross-lingual transfer protocols with BERT. We further observe that domain-specific joint pre-training of the PLRM using HR policy documents in English along with the machine translations in the target languages, followed by the joint finetuning, provides a further improvement of 2.8{%} in average accuracy. | |
Tasks | Cross-Lingual Transfer |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-6113/ |
https://www.aclweb.org/anthology/D19-6113 | |
PWC | https://paperswithcode.com/paper/from-monolingual-to-multilingual-faq |
Repo | |
Framework | |
Learning to Link Grammar and Encyclopedic Information of Assist ESL Learners
Title | Learning to Link Grammar and Encyclopedic Information of Assist ESL Learners |
Authors | Jhih-Jie Chen, Chingyu Yang, Peichen Ho, Ming Chiao Tsai, Chia-Fang Ho, Kai-Wen Tuan, Chung-Ting Tsai, Wen-Bin Han, Jason Chang |
Abstract | We introduce a system aimed at improving and expanding second language learners{'} English vocabulary. In addition to word definitions, we provide rich lexical information such as collocations and grammar patterns for target words. We present Linggle Booster that takes an article, identifies target vocabulary, provides lexical information, and generates a quiz on target words. Linggle Booster also links named-entity to corresponding Wikipedia pages. Evaluation on a set of target words shows that the method have reasonably good performance in terms of generating useful and information for learning vocabulary. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-3034/ |
https://www.aclweb.org/anthology/P19-3034 | |
PWC | https://paperswithcode.com/paper/learning-to-link-grammar-and-encyclopedic |
Repo | |
Framework | |
Discourse Analysis and Its Applications
Title | Discourse Analysis and Its Applications |
Authors | Shafiq Joty, Giuseppe Carenini, Raymond Ng, Gabriel Murray |
Abstract | Discourse processing is a suite of Natural Language Processing (NLP) tasks to uncover linguistic structures from texts at several levels, which can support many downstream applications. This involves identifying the topic structure, the coherence structure, the coreference structure, and the conversation structure for conversational discourse. Taken together, these structures can inform text summarization, machine translation, essay scoring, sentiment analysis, information extraction, question answering, and thread recovery. The tutorial starts with an overview of basic concepts in discourse analysis {–} monologue vs. conversation, synchronous vs. asynchronous conversation, and key linguistic structures in discourse analysis. We also give an overview of linguistic structures and corresponding discourse analysis tasks that discourse researchers are generally interested in, as well as key applications on which these discourse structures have an impact. |
Tasks | Machine Translation, Question Answering, Sentiment Analysis, Text Summarization |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-4003/ |
https://www.aclweb.org/anthology/P19-4003 | |
PWC | https://paperswithcode.com/paper/discourse-analysis-and-its-applications |
Repo | |
Framework | |
Journalist-in-the-Loop: Continuous Learning as a Service for Rumour Analysis
Title | Journalist-in-the-Loop: Continuous Learning as a Service for Rumour Analysis |
Authors | Twin Karmakharm, Nikolaos Aletras, Kalina Bontcheva |
Abstract | Automatically identifying rumours in social media and assessing their veracity is an important task with downstream applications in journalism. A significant challenge is how to keep rumour analysis tools up-to-date as new information becomes available for particular rumours that spread in a social network. This paper presents a novel open-source web-based rumour analysis tool that can continuous learn from journalists. The system features a rumour annotation service that allows journalists to easily provide feedback for a given social media post through a web-based interface. The feedback allows the system to improve an underlying state-of-the-art neural network-based rumour classification model. The system can be easily integrated as a service into existing tools and platforms used by journalists using a REST API. |
Tasks | Rumour Detection |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-3020/ |
https://www.aclweb.org/anthology/D19-3020 | |
PWC | https://paperswithcode.com/paper/journalist-in-the-loop-continuous-learning-as |
Repo | |
Framework | |
Unsupervised Learning of Consensus Maximization for 3D Vision Problems
Title | Unsupervised Learning of Consensus Maximization for 3D Vision Problems |
Authors | Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool |
Abstract | Consensus maximization is a key strategy in 3D vision for robust geometric model estimation from measurements with outliers. Generic methods for consensus maximization, such as Random Sampling and Consensus (RANSAC), have played a tremendous role in the success of 3D vision, in spite of the ubiquity of outliers. However, replicating the same generic behaviour in a deeply learned architecture, using supervised approaches, has proven to be difficult. In that context, unsupervised methods have a huge potential to adapt to any unseen data distribution, and therefore are highly desirable. In this paper, we propose for the first time an unsupervised learning framework for consensus maximization, in the context of solving 3D vision problems. For that purpose, we establish a relationship between inlier measurements, represented by an ideal of inlier set, and the subspace of polynomials representing the space of target transformations. Using this relationship, we derive a constraint that must be satisfied by the sought inlier set. This constraint can be tested without knowing the transformation parameters, therefore allows us to efficiently define the geometric model fitting cost. This model fitting cost is used as a supervisory signal for learning consensus maximization, where the learning process seeks for the largest measurement set that minimizes the proposed model fitting cost. Using our method, we solve a diverse set of 3D vision problems, including 3D-3D matching, non-rigid 3D shape matching with piece-wise rigidity and image-to-image matching. Despite being unsupervised, our method outperforms RANSAC in all three tasks for several datasets. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Probst_Unsupervised_Learning_of_Consensus_Maximization_for_3D_Vision_Problems_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Probst_Unsupervised_Learning_of_Consensus_Maximization_for_3D_Vision_Problems_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-consensus |
Repo | |
Framework | |
Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
Title | Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation |
Authors | Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao |
Abstract | Unsupervised bilingual word embedding (UBWE), together with other technologies such as back-translation and denoising, has helped unsupervised neural machine translation (UNMT) achieve remarkable results in several language pairs. In previous methods, UBWE is first trained using non-parallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT. That is, the training of UBWE and UNMT are separate. In this paper, we first empirically investigate the relationship between UBWE and UNMT. The empirical findings show that the performance of UNMT is significantly affected by the performance of UBWE. Thus, we propose two methods that train UNMT with UBWE agreement. Empirical results on several language pairs show that the proposed methods significantly outperform conventional UNMT. |
Tasks | Denoising, Machine Translation |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1119/ |
https://www.aclweb.org/anthology/P19-1119 | |
PWC | https://paperswithcode.com/paper/unsupervised-bilingual-word-embedding |
Repo | |
Framework | |
Learning Outcomes and Their Relatedness in a Medical Curriculum
Title | Learning Outcomes and Their Relatedness in a Medical Curriculum |
Authors | Sneha Mondal, Tejas Dhamecha, Shantanu Godbole, Smriti Pathak, Red Mendoza, K Gayathri Wijayarathna, Nabil Zary, Swarnadeep Saha, Malolan Chetlur |
Abstract | A typical medical curriculum is organized in a hierarchy of instructional objectives called Learning Outcomes (LOs); a few thousand LOs span five years of study. Gaining a thorough understanding of the curriculum requires learners to recognize and apply related LOs across years, and across different parts of the curriculum. However, given the large scope of the curriculum, manually labeling related LOs is tedious, and almost impossible to scale. In this paper, we build a system that learns relationships between LOs, and we achieve up to human-level performance in the LO relationship extraction task. We then present an application where the proposed system is employed to build a map of related LOs and Learning Resources (LRs) pertaining to a virtual patient case. We believe that our system can help medical students grasp the curriculum better, within classroom as well as in Intelligent Tutoring Systems (ITS) settings. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4442/ |
https://www.aclweb.org/anthology/W19-4442 | |
PWC | https://paperswithcode.com/paper/learning-outcomes-and-their-relatedness-in-a |
Repo | |
Framework | |
Weak contraction mapping and optimization
Title | Weak contraction mapping and optimization |
Authors | Siwei Luo |
Abstract | The weak contraction mapping is a self mapping that the range is always a subset of the domain, which admits a unique fixed-point. The iteration of weak contraction mapping is a Cauchy sequence that yields the unique fixed-point. A gradient-free optimization method as an application of weak contraction mapping is proposed to achieve global minimum convergence. The optimization method is robust to local minima and initial point position. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SygJSiA5YQ |
https://openreview.net/pdf?id=SygJSiA5YQ | |
PWC | https://paperswithcode.com/paper/weak-contraction-mapping-and-optimization |
Repo | |
Framework | |