October 16, 2019

2529 words 12 mins read

Paper Group NAWR 1

Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text. Varying image description tasks: spoken versus written descriptions. Creating a Translation Matrix of the Bible’s Names Across 591 Languages. Encoding Sentiment Information into Word Vectors for Sentiment Analysis. Named Entity Recognition With Parallel Recurrent N …

Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text


Title	Adaptive Multi-Task Transfer Learning for Chinese Word Segmentation in Medical Text
Authors	Junjie Xing, Kenny Zhu, Shaodian Zhang
Abstract	Chinese word segmentation (CWS) trained from open source corpus faces dramatic performance drop when dealing with domain text, especially for a domain with lots of special terms and diverse writing styles, such as the biomedical domain. However, building domain-specific CWS requires extremely high annotation cost. In this paper, we propose an approach by exploiting domain-invariant knowledge from high resource to low resource domains. Extensive experiments show that our model achieves consistently higher accuracy than the single-task CWS and other transfer learning baselines, especially when there is a large disparity between source and target domains.
Tasks	Chinese Word Segmentation, Transfer Learning
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1307/
PDF	https://www.aclweb.org/anthology/C18-1307
PWC	https://paperswithcode.com/paper/adaptive-multi-task-transfer-learning-for
Repo	https://github.com/adapt-sjtu/AMTTL
Framework	tf

Varying image description tasks: spoken versus written descriptions


Title	Varying image description tasks: spoken versus written descriptions
Authors	Emiel van Miltenburg, Ruud Koolen, Emiel Krahmer
Abstract	Automatic image description systems are commonly trained and evaluated on written image descriptions. At the same time, these systems are often used to provide spoken descriptions (e.g. for visually impaired users) through apps like TapTapSee or Seeing AI. This is not a problem, as long as spoken and written descriptions are very similar. However, linguistic research suggests that spoken language often differs from written language. These differences are not regular, and vary from context to context. Therefore, this paper investigates whether there are differences between written and spoken image descriptions, even if they are elicited through similar tasks. We compare descriptions produced in two languages (English and Dutch), and in both languages observe substantial differences between spoken and written descriptions. Future research should see if users prefer the spoken over the written style and, if so, aim to emulate spoken descriptions.
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-3910/
PDF	https://www.aclweb.org/anthology/W18-3910
PWC	https://paperswithcode.com/paper/varying-image-description-tasks-spoken-versus
Repo	https://github.com/cltl/Spoken-versus-Written
Framework	none

Creating a Translation Matrix of the Bible’s Names Across 591 Languages


Title	Creating a Translation Matrix of the Bible’s Names Across 591 Languages
Authors	Winston Wu, Nidhi Vyas, David Yarowsky
Abstract
Tasks	Entity Alignment, Machine Translation, Morphological Analysis, Part-Of-Speech Tagging, Transliteration, Word Alignment
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1263/
PDF	https://www.aclweb.org/anthology/L18-1263
PWC	https://paperswithcode.com/paper/creating-a-translation-matrix-of-the-bibleas
Repo	https://github.com/wswu/trabina
Framework	none

Encoding Sentiment Information into Word Vectors for Sentiment Analysis


Title	Encoding Sentiment Information into Word Vectors for Sentiment Analysis
Authors	Zhe Ye, Fang Li, Timothy Baldwin
Abstract	General-purpose pre-trained word embeddings have become a mainstay of natural language processing, and more recently, methods have been proposed to encode external knowledge into word embeddings to benefit specific downstream tasks. The goal of this paper is to encode sentiment knowledge into pre-trained word vectors to improve the performance of sentiment analysis. Our proposed method is based on a convolutional neural network (CNN) and an external sentiment lexicon. Experiments on four popular sentiment analysis datasets show that this method improves the accuracy of sentiment analysis compared to a number of benchmark methods.
Tasks	Learning Word Embeddings, Sentiment Analysis, Word Embeddings
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1085/
PDF	https://www.aclweb.org/anthology/C18-1085
PWC	https://paperswithcode.com/paper/encoding-sentiment-information-into-word
Repo	https://github.com/yezhejack/SentiNet
Framework	pytorch

Named Entity Recognition With Parallel Recurrent Neural Networks


Title	Named Entity Recognition With Parallel Recurrent Neural Networks
Authors	Andrej {\v{Z}}ukov-Gregori{\v{c}}, Yoram Bachrach, Sam Coope
Abstract	We present a new architecture for named entity recognition. Our model employs multiple independent bidirectional LSTM units across the same input and promotes diversity among them by employing an inter-model regularization term. By distributing computation across multiple smaller LSTMs we find a significant reduction in the total number of parameters. We find our architecture achieves state-of-the-art performance on the CoNLL 2003 NER dataset.
Tasks	Feature Engineering, Named Entity Recognition, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2012/
PDF	https://www.aclweb.org/anthology/P18-2012
PWC	https://paperswithcode.com/paper/named-entity-recognition-with-parallel
Repo	https://github.com/speedcell4/UOI-P18-2012
Framework	none

Frame- and Entity-Based Knowledge for Common-Sense Argumentative Reasoning


Title	Frame- and Entity-Based Knowledge for Common-Sense Argumentative Reasoning
Authors	Teresa Botschen, Daniil Sorokin, Iryna Gurevych
Abstract	Common-sense argumentative reasoning is a challenging task that requires holistic understanding of the argumentation where external knowledge about the world is hypothesized to play a key role. We explore the idea of using event knowledge about prototypical situations from FrameNet and fact knowledge about concrete entities from Wikidata to solve the task. We find that both resources can contribute to an improvement over the non-enriched approach and point out two persisting challenges: first, integration of many annotations of the same type, and second, fusion of complementary annotations. After our explorations, we question the key role of external world knowledge with respect to the argumentative reasoning task and rather point towards a logic-based analysis of the chain of reasoning.
Tasks	Argument Mining, Common Sense Reasoning, Dependency Parsing, Natural Language Inference, Question Answering, Relation Classification, Semantic Parsing, Semantic Role Labeling, Word Embeddings
Published	2018-11-01
URL	https://www.aclweb.org/anthology/W18-5211/
PDF	https://www.aclweb.org/anthology/W18-5211
PWC	https://paperswithcode.com/paper/frame-and-entity-based-knowledge-for-common
Repo	https://github.com/UKPLab/emnlp2018-argmin-commonsense-knowledge
Framework	none

CATS: A Tool for Customized Alignment of Text Simplification Corpora


Title	CATS: A Tool for Customized Alignment of Text Simplification Corpora
Authors	Sanja {\v{S}}tajner, Marc Franco-Salvador, Paolo Rosso, Simone Paolo Ponzetto
Abstract
Tasks	Text Simplification
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1615/
PDF	https://www.aclweb.org/anthology/L18-1615
PWC	https://paperswithcode.com/paper/cats-a-tool-for-customized-alignment-of-text
Repo	https://github.com/neosyon/SimpTextAlign
Framework	tf

ASTER: An Attentional Scene Text Recognizer with Flexible Rectification


Title	ASTER: An Attentional Scene Text Recognizer with Flexible Rectification
Authors	Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai
Abstract	SCENE text recognition has attracted great interest from the academia and the industry in recent years owing to its importance in a wide range of applications. Despite the maturity of Optical Character Recognition (OCR) systems dedicated to document text, scene text recognition remains a challenging problem. The large variations in background, appearance, and layout pose significant challenges, which the traditional OCR methods cannot handle effectively. Recent advances in scene text recognition are driven by the success of deep learning-based recognition models. Among them are methods that recognize text by characters using convolutional neural networks (CNN), methods that classify words with CNNs [24], [26], and methods that recognize character sequences using a combination of a CNN and a recurrent neural network (RNN) [54]. In spite of their success, these methods do not explicitly address the problem of irregular text, which is text that is not horizontal and frontal, has curved layout, etc. Instances of irregular text frequently appear in natural scenes. As exemplified in Figure 1, typical cases include oriented text, perspective text [49], and curved text. Designed without the invariance to such irregularities, previous methods often struggle in recognizing such text instances.
Tasks	Optical Character Recognition, Scene Text Recognition
Published	2018-06-25
URL	http://122.205.5.5:8071/UpLoadFiles/Papers/ASTER_PAMI18.pdf
PDF	http://122.205.5.5:8071/UpLoadFiles/Papers/ASTER_PAMI18.pdf
PWC	https://paperswithcode.com/paper/aster-an-attentional-scene-text-recognizer
Repo	https://github.com/bgshih/aster
Framework	tf

Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction


Title	Ranking-Based Automatic Seed Selection and Noise Reduction for Weakly Supervised Relation Extraction
Authors	Van-Thuy Phi, Joan Santoso, Masashi Shimbo, Yuji Matsumoto
Abstract	This paper addresses the tasks of automatic seed selection for bootstrapping relation extraction, and noise reduction for distantly supervised relation extraction. We first point out that these tasks are related. Then, inspired by ranking relation instances and patterns computed by the HITS algorithm, and selecting cluster centroids using the K-means, LSA, or NMF method, we propose methods for selecting the initial seeds from an existing resource, or reducing the level of noise in the distantly labeled data. Experiments show that our proposed methods achieve a better performance than the baseline systems in both tasks.
Tasks	Relation Extraction, Word Sense Disambiguation
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2015/
PDF	https://www.aclweb.org/anthology/P18-2015
PWC	https://paperswithcode.com/paper/ranking-based-automatic-seed-selection-and
Repo	https://github.com/pvthuy/part-whole-relations
Framework	none

Using Formulaic Expressions in Writing Assistance Systems


Title	Using Formulaic Expressions in Writing Assistance Systems
Authors	Kenichi Iwatsuki, Akiko Aizawa
Abstract	Formulaic expressions (FEs) used in scholarly papers, such as {`}there has been little discussion about{'}, are helpful for non-native English speakers. However, it is time-consuming for users to manually search for an appropriate expression every time they want to consult FE dictionaries. For this reason, we tackle the task of semantic searches of FE dictionaries. At the start of our research, we identified two salient difficulties in this task. First, the paucity of example sentences in existing FE dictionaries results in a shortage of context information, which is necessary for acquiring semantic representation of FEs. Second, while a semantic category label is assigned to each FE in many FE dictionaries, it is difficult to predict the labels from user input, forcing users to manually designate the semantic category when searching. To address these difficulties, we propose a new framework for semantic searches of FEs and propose a new method to leverage both existing dictionaries and domain sentence corpora. Further, we expand an existing FE dictionary to consider building a more comprehensive and domain-specific FE dictionary and to verify the effectiveness of our method. \|
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1227/
PDF	https://www.aclweb.org/anthology/C18-1227
PWC	https://paperswithcode.com/paper/using-formulaic-expressions-in-writing
Repo	https://github.com/Alab-NII/FE
Framework	none

Differentiable Monte Carlo Ray Tracing through Edge Sampling


Title	Differentiable Monte Carlo Ray Tracing through Edge Sampling
Authors	Tzu-Mao Li, Miika Aittala, Frédo Durand, Jaakko Lehtinen
Abstract	Gradient-based methods are becoming increasingly important for computer graphics, machine learning, and computer vision. The ability to compute gradients is crucial to optimization, inverse problems, and deep learning. In rendering, the gradient is required with respect to variables such as camera parameters, light sources, scene geometry, or material appearance. However, computing the gradient of rendering is challenging because the rendering integral includes visibility terms that are not differentiable. Previous work on differentiable rendering has focused on approximate solutions. They often do not handle secondary effects such as shadows or global illumination, or they do not provide the gradient with respect to variables other than pixel coordinates. We introduce a general-purpose differentiable ray tracer, which, to our knowledge, is the first comprehensive solution that is able to compute derivatives of scalar functions over a rendered image with respect to arbitrary scene parameters such as camera pose, scene geometry, materials, and lighting parameters. The key to our method is a novel edge sampling algorithm that directly samples the Dirac delta functions introduced by the derivatives of the discontinuous integrand. We also develop efficient importance sampling methods based on spatial hierarchies. Our method can generate gradients in times running from seconds to minutes depending on scene complexity and desired precision. We interface our differentiable ray tracer with the deep learning library PyTorch and show prototype applications in inverse rendering and the generation of adversarial examples for neural networks.
Tasks
Published	2018-08-12
URL	https://people.csail.mit.edu/tzumao/diffrt/
PDF	https://people.csail.mit.edu/tzumao/diffrt/diffrt.pdf
PWC	https://paperswithcode.com/paper/differentiable-monte-carlo-ray-tracing
Repo	https://github.com/BachiLi/redner
Framework	pytorch

Porter 5: fast, state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes


Title	Porter 5: fast, state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes
Authors	Mirko Torrisi, Manaz Kaleel, Gianluca Pollastri
Abstract	Motivation: Although secondary structure predictors have been developed for decades, current ab initio methods have still some way to go to reach their theoretical limits. Moreover, the continuous effort towards harnessing ever-expanding data sets and more sophisticated, deeper Machine Learning techniques, has not come to an end. Results: Here we present Porter 5, the latest release of one of the best performing ab initio secondary structure predictors. Version 5 achieves 84% accuracy (84% SOV) when tested on 3 classes, and 73% accuracy (77% SOV) on 8 classes, on a large independent set, significantly outperforming all the most recent ab initio predictors we have tested. Availability: The web and standalone versions of Porter5 are available at http://distilldeep.ucd.ie/porter/.
Tasks	Protein Secondary Structure Prediction
Published	2018-10-05
URL	https://doi.org/10.1101/289033
PDF	https://www.biorxiv.org/content/early/2018/10/05/289033.full.pdf
PWC	https://paperswithcode.com/paper/porter-5-fast-state-of-the-art-ab-initio
Repo	https://github.com/mircare/Porter5
Framework	none

SUNNYNLP at SemEval-2018 Task 10: A Support-Vector-Machine-Based Method for Detecting Semantic Difference using Taxonomy and Word Embedding Features


Title	SUNNYNLP at SemEval-2018 Task 10: A Support-Vector-Machine-Based Method for Detecting Semantic Difference using Taxonomy and Word Embedding Features
Authors	Sunny Lai, Kwong Sak Leung, Yee Leung
Abstract	We present SUNNYNLP, our system for solving SemEval 2018 Task 10: {``}Capturing Discriminative Attributes{''}. Our Support-Vector-Machine(SVM)-based system combines features extracted from pre-trained embeddings and statistical information from Is-A taxonomy to detect semantic difference of concepts pairs. Our system is demonstrated to be effective in detecting semantic difference and is ranked 1st in the competition in terms of F1 measure. The open source of our code is coined SUNNYNLP. \|
Tasks	Dialogue State Tracking, Question Answering, Semantic Textual Similarity
Published	2018-06-01
URL	https://www.aclweb.org/anthology/S18-1118/
PDF	https://www.aclweb.org/anthology/S18-1118
PWC	https://paperswithcode.com/paper/sunnynlp-at-semeval-2018-task-10-a-support
Repo	https://github.com/Yermouth/sunnynlp
Framework	none

Extremely Randomized CNets for Multi-label Classification


Title	Extremely Randomized CNets for Multi-label Classification
Authors	Teresa M.A. Basile, Nicola Di Mauro, Floriana Esposito
Abstract	Multi-label classification (MLC) is a challenging task in ma-chine learning consisting in the prediction of multiple labels associated with a single instance. Promising approaches for MLC are those able to capture label dependencies by learning a single probabilistic model—differently from other competitive approaches requiring to learn many models. The model is then exploited to compute the most probable label configuration given the observed attributes. Cutset Networks (CNets) are density estimators leveraging context-specific independencies providing exact inference in polynomial time. The recently introduced Extremely Randomized CNets (XCNets) reduce the structure learning complexity making able to learn ensembles of XCNets outperforming state-of-the-art density estimators. In this paper we employ XCNets for MLC by exploiting efficient Most Probable Explanations (MPE). An experimental evaluation on real-world datasets shows how the proposed approach is competitive w.r.t. other sophisticated methods for MLC
Tasks	Density Estimation, Multi-Label Classification
Published	2018-10-01
URL	https://link.springer.com/chapter/10.1007/978-3-030-03840-3_25
PDF	http://www.di.uniba.it/~ndm/pubs/basile18aixia.pdf
PWC	https://paperswithcode.com/paper/extremely-randomized-cnets-for-multi-label
Repo	https://github.com/nicoladimauro/mlxcnet
Framework	none

Large Scale Image Segmentation with Structured Loss based Deep Learning for Connectome Reconstruction


Title	Large Scale Image Segmentation with Structured Loss based Deep Learning for Connectome Reconstruction
Authors	Jan Funke, Fabian David Tschopp, William Grisaitis, Arlo Sheridan, Chandan Singh, Stephan Saalfeld, Srinivas C. Turaga
Abstract	We present a method combining affinity prediction with region agglomeration, which improves significantly upon the state of the art of neuron segmentation from electron microscopy (EM) in accuracy and scalability. Our method consists of a 3D U-net, trained to predict affinities between voxels, followed by iterative region agglomeration. We train using a structured loss based on MALIS, encouraging topologically correct segmentations obtained from affinity thresholding. Our extension consists of two parts: First, we present a quasi-linear method to compute the loss gradient, improving over the original quadratic algorithm. Second, we compute the gradient in two separate passes to avoid spurious gradient contributions in early training stages. Our predictions are accurate enough that simple learning-free percentile-based agglomeration outperforms more involved methods used earlier on inferior predictions. We present results on three diverse EM datasets, achieving relative improvements over previous results of 27%, 15%, and 250%. Our findings suggest that a single method can be applied to both nearly isotropic block-face EM data and anisotropic serial sectioned EM data. The runtime of our method scales linearly with the size of the volume and achieves a throughput of ~2.6 seconds per megavoxel, qualifying our method for the processing of very large datasets.
Tasks	Brain Image Segmentation, Semantic Segmentation
Published	2018-05-24
URL	https://ieeexplore.ieee.org/abstract/document/8364622/authors#authors
PDF	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8364622
PWC	https://paperswithcode.com/paper/large-scale-image-segmentation-with
Repo	https://github.com/funkey/mala
Framework	tf