January 24, 2020

2864 words 14 mins read

Paper Group NANR 124

EU 4 U: An educational platform for the cultural heritage of the EU. Supervised neural machine translation based on data augmentation and improved training & inference process. Feature Transfer Learning for Face Recognition With Under-Represented Data. Multilabel reductions: what is my loss optimising?. Towards Rich Feature Discovery With Class Ac …

EU 4 U: An educational platform for the cultural heritage of the EU


Title	EU 4 U: An educational platform for the cultural heritage of the EU
Authors	Maria Stambolieva
Abstract	The paper presents an ongoing project of the NBU Laboratory for Language Technology aiming to create a multilingual, CEFR-graded electronic didactic resource for online learning, centered on the history and cultural heritage of the EU (e-EULearn). The resource is developed within the e-Platform of the NBU Laboratory for Language Technology and re-uses the rich corpus of educational material created at the Laboratory for the needs of NBU program modules, distance and blended learning language courses and other projects. Focus being not just on foreign language tuition, but above all on people, places and events in the history and culture of the EU member states, the annotation modules of the e-Platform have been accordingly extended. Current and upcoming activities are directed at: 1/ enriching the English corpus of didactic materials on EU history and culture, 2/ translating the texts into (the) other official EU languages and aligning the translations with the English texts; 3/ developing new test modules. In the process of developing this resource, a database on important people, places, objects and events in the cultural history of the EU will be created.
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/W19-9006/
PDF	https://www.aclweb.org/anthology/W19-9006
PWC	https://paperswithcode.com/paper/eu-4-u-an-educational-platform-for-the
Repo
Framework

Supervised neural machine translation based on data augmentation and improved training & inference process


Title	Supervised neural machine translation based on data augmentation and improved training & inference process
Authors	Yixuan Tong, Liang Liang, Boyan Liu, Shanshan Jiang, Bin Dong
Abstract	This is the second time for SRCB to participate in WAT. This paper describes the neural machine translation systems for the shared translation tasks of WAT 2019. We participated in ASPEC tasks and submitted results on English-Japanese, Japanese-English, Chinese-Japanese, and Japanese-Chinese four language pairs. We employed the Transformer model as the baseline and experimented relative position representation, data augmentation, deep layer model, ensemble. Experiments show that all these methods can yield substantial improvements.
Tasks	Data Augmentation, Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5218/
PDF	https://www.aclweb.org/anthology/D19-5218
PWC	https://paperswithcode.com/paper/supervised-neural-machine-translation-based
Repo
Framework

Feature Transfer Learning for Face Recognition With Under-Represented Data


Title	Feature Transfer Learning for Face Recognition With Under-Represented Data
Authors	Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker
Abstract	Despite the large volume of face recognition datasets, there is a significant portion of subjects, of which the samples are insufficient and thus under-represented. Ignoring such significant portion results in insufficient training data. Training with under-represented data leads to biased classifiers in conventionally-trained deep networks. In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples. A Gaussian prior of the variance is assumed across all subjects and the variance from regular ones are transferred to the under-represented ones. This encourages the under-represented distribution to be closer to the regular distribution. Further, an alternating training regimen is proposed to simultaneously achieve less biased classifiers and a more discriminative feature representation. We conduct ablative study to mimic the under-represented datasets by varying the portion of under-represented classes on the MS-Celeb-1M dataset. Advantageous results on LFW, IJB-A and MS-Celeb-1M demonstrate the effectiveness of our feature transfer and training strategy, compared to both general baselines and state-of-the-art methods. Moreover, our feature transfer successfully presents smooth visual interpolation, which conducts disentanglement to preserve identity of a class while augmenting its feature space with non-identity variations such as pose and lighting.
Tasks	Face Recognition, Transfer Learning
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Yin_Feature_Transfer_Learning_for_Face_Recognition_With_Under-Represented_Data_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Yin_Feature_Transfer_Learning_for_Face_Recognition_With_Under-Represented_Data_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/feature-transfer-learning-for-face
Repo
Framework

Multilabel reductions: what is my loss optimising?


Title	Multilabel reductions: what is my loss optimising?
Authors	Aditya K. Menon, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
Abstract	Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging. A popular approach to this problem is to employ a reduction to a suitable series of binary or multiclass problems (e.g., computing a softmax based cross-entropy over the relevant labels). While such methods have seen empirical success, less is understood about how well they approximate two fundamental performance measures: precision@$k$ and recall@$k$. In this paper, we study five commonly used reductions, including the one-versus-all reduction, a reduction to multiclass classification, and normalised versions of the same, wherein the contribution of each instance is normalised by the number of relevant labels. Our main result is a formal justification of each reduction: we explicate their underlying risks, and show they are each consistent with respect to either precision or recall. Further, we show that in general no reduction can be optimal for both measures. We empirically validate our results, demonstrating scenarios where normalised reductions yield recall gains over unnormalised counterparts.
Tasks	Information Retrieval
Published	2019-12-01
URL	http://papers.nips.cc/paper/9245-multilabel-reductions-what-is-my-loss-optimising
PDF	http://papers.nips.cc/paper/9245-multilabel-reductions-what-is-my-loss-optimising.pdf
PWC	https://paperswithcode.com/paper/multilabel-reductions-what-is-my-loss
Repo
Framework

Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification


Title	Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification
Authors	Wenjie Yang, Houjing Huang, Zhang Zhang, Xiaotang Chen, Kaiqi Huang, Shu Zhang
Abstract	The fundamental challenge of small inter-person variation requires Person Re-Identification (Re-ID) models to capture sufficient fine-grained information. This paper proposes to discover diverse discriminative visual cues without extra assistance, e.g., pose estimation, human parsing. Specifically, a Class Activation Maps (CAM) augmentation model is proposed to expand the activation scope of baseline Re-ID model to explore rich visual cues, where the backbone network is extended by a series of ordered branches which share the same input but output complementary CAM. A novel Overlapped Activation Penalty is proposed to force the new branch to pay more attention to the image regions less activated by the old ones, such that spatial diverse visual features can be discovered. The proposed model achieves state-of-the-art results on three person Re-ID benchmarks. Moreover, a visualization approach termed ranking activation map (RAM) is proposed to explicitly interpret the ranking results in the test stage, which gives qualitative validations of the proposed method.
Tasks	Human Parsing, Person Re-Identification, Pose Estimation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Yang_Towards_Rich_Feature_Discovery_With_Class_Activation_Maps_Augmentation_for_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Towards_Rich_Feature_Discovery_With_Class_Activation_Maps_Augmentation_for_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/towards-rich-feature-discovery-with-class
Repo
Framework

GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images


Title	GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images
Authors	Erickson R. Nascimento, Guilherme Potje, Renato Martins, Felipe Cadar, Mario F. M. Campos, Ruzena Bajcsy
Abstract	At the core of most three-dimensional alignment and tracking tasks resides the critical problem of point correspondence. In this context, the design of descriptors that efficiently and uniquely identifies keypoints, to be matched, is of central importance. Numerous descriptors have been developed for dealing with affine/perspective warps, but few can also handle non-rigid deformations. In this paper, we introduce a novel binary RGB-D descriptor invariant to isometric deformations. Our method uses geodesic isocurves on smooth textured manifolds. It combines appearance and geometric information from RGB-D images to tackle non-rigid transformations. We used our descriptor to track multiple textured depth maps and demonstrate that it produces reliable feature descriptors even in the presence of strong non-rigid deformations and depth noise. The experiments show that our descriptor outperforms different state-of-the-art descriptors in both precision-recall and recognition rate metrics. We also provide to the community a new dataset composed of annotated RGB-D images of different objects (shirts, cloths, paintings, bags), subjected to strong non-rigid deformations, to evaluate point correspondence algorithms.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Nascimento_GEOBIT_A_Geodesic-Based_Binary_Descriptor_Invariant_to_Non-Rigid_Deformations_for_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Nascimento_GEOBIT_A_Geodesic-Based_Binary_Descriptor_Invariant_to_Non-Rigid_Deformations_for_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/geobit-a-geodesic-based-binary-descriptor
Repo
Framework

Supervising Unsupervised Open Information Extraction Models


Title	Supervising Unsupervised Open Information Extraction Models
Authors	Arpita Roy, Youngja Park, Taesung Lee, Shimei Pan
Abstract	We propose a novel supervised open information extraction (Open IE) framework that leverages an ensemble of unsupervised Open IE systems and a small amount of labeled data to improve system performance. It uses the outputs of multiple unsupervised Open IE systems plus a diverse set of lexical and syntactic information such as word embedding, part-of-speech embedding, syntactic role embedding and dependency structure as its input features and produces a sequence of word labels indicating whether the word belongs to a relation, the arguments of the relation or irrelevant. Comparing with existing supervised Open IE systems, our approach leverages the knowledge in existing unsupervised Open IE systems to overcome the problem of insufficient training data. By employing multiple unsupervised Open IE systems, our system learns to combine the strength and avoid the weakness in each individual Open IE system. We have conducted experiments on multiple labeled benchmark data sets. Our evaluation results have demonstrated the superiority of the proposed method over existing supervised and unsupervised models by a significant margin.
Tasks	Open Information Extraction, Role Embedding
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1067/
PDF	https://www.aclweb.org/anthology/D19-1067
PWC	https://paperswithcode.com/paper/supervising-unsupervised-open-information
Repo
Framework

CaRB: A Crowdsourced Benchmark for Open IE


Title	CaRB: A Crowdsourced Benchmark for Open IE
Authors	Sangnie Bhardwaj, Samarth Aggarwal, Mausam Mausam
Abstract	Open Information Extraction (Open IE) systems have been traditionally evaluated via manual annotation. Recently, an automated evaluator with a benchmark dataset (OIE2016) was released {–} it scores Open IE systems automatically by matching system predictions with predictions in the benchmark dataset. Unfortunately, our analysis reveals that its data is rather noisy, and the tuple matching in the evaluator has issues, making the results of automated comparisons less trustworthy. We contribute CaRB, an improved dataset and framework for testing Open IE systems. To the best of our knowledge, CaRB is the first crowdsourced Open IE dataset and it also makes substantive changes in the matching code and metrics. NLP experts annotate CaRB{'}s dataset to be more accurate than OIE2016. Moreover, we find that on one pair of Open IE systems, CaRB framework provides contradictory results to OIE2016. Human assessment verifies that CaRB{'}s ranking of the two systems is the accurate ranking. We release the CaRB framework along with its crowdsourced dataset.
Tasks	Open Information Extraction
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1651/
PDF	https://www.aclweb.org/anthology/D19-1651
PWC	https://paperswithcode.com/paper/carb-a-crowdsourced-benchmark-for-open-ie
Repo
Framework

Crisis Detection from Arabic Tweets


Title	Crisis Detection from Arabic Tweets
Authors	Alaa Alharbi, Mark Lee
Abstract
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/W19-5609/
PDF	https://www.aclweb.org/anthology/W19-5609
PWC	https://paperswithcode.com/paper/crisis-detection-from-arabic-tweets
Repo
Framework

A Parallel Corpus Mixtec-Spanish


Title	A Parallel Corpus Mixtec-Spanish
Authors	Cynthia Monta{~n}o, Gerardo Sierra Mart{'\i}nez, Gemma Bel-Enguix, Helena Gomez
Abstract	This work is about the compilation process of parallel documents Spanish-Mixtec. There are not many Spanish-Mixec parallel texts and most of the sources are non-digital books. Due to this, we need to face the errors when digitizing the sources and difficulties in sentence alignment, as well as the fact that does not exist a standard orthography. Our parallel corpus consists of sixty texts coming from books and digital repositories. These documents belong to different domains: history, traditional stories, didactic material, recipes, ethnographical de- scriptions of each town and instruction manuals for disease prevention. We have classified this material in five major categories: didactic (6 texts), educative (6 texts), interpretative (7 texts), narrative (39 texts), and poetic (2 texts). The final total of tokens is 49,814 Spanish words and 47,774 Mixtec words. The texts belong to the states of Oaxaca (48 texts), Guerrero (9 texts) and Puebla (3 texts). According to this data, we see that the corpus is unbalanced in what refers to the representation of the different territories. While 55{%} of speakers are in Oaxaca, 80{%} of texts come from this region. Guerrero has the 30{%} of speakers and the 15{%} of texts and Puebla, with the 15{%} of the speakers has a representation of the 5{%} in the corpus.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/papers/W/W19/W19-3650/
PDF	https://www.aclweb.org/anthology/W19-3650
PWC	https://paperswithcode.com/paper/a-parallel-corpus-mixtec-spanish
Repo
Framework

Flamb'e: A Customizable Framework for Machine Learning Experiments


Title	Flamb'e: A Customizable Framework for Machine Learning Experiments
Authors	Jeremy Wohlwend, Nicholas Matthews, Ivan Itzcovich
Abstract	Flamb{'e} is a machine learning experimentation framework built to accelerate the entire research life cycle. Flamb{'e}{'}s main objective is to provide a unified interface for prototyping models, running experiments containing complex pipelines, monitoring those experiments in real-time, reporting results, and deploying a final model for inference. Flamb{'e} achieves both flexibility and simplicity by allowing users to write custom code but instantly include that code as a component in a larger system which is represented by a concise configuration file format. We demonstrate the application of the framework through a cutting-edge multistage use case: fine-tuning and distillation of a state of the art pretrained language model used for text classification.
Tasks	Language Modelling, Text Classification
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-3029/
PDF	https://www.aclweb.org/anthology/P19-3029
PWC	https://paperswithcode.com/paper/flambe-a-customizable-framework-for-machine
Repo
Framework

SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360deg Images


Title	SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360deg Images
Authors	Yeonkun Lee, Jaeseok Jeong, Jongseob Yun, Wonjune Cho, Kuk-Jin Yoon
Abstract	Omni-directional cameras have many advantages overconventional cameras in that they have a much wider field-of-view (FOV). Accordingly, several approaches have beenproposed recently to apply convolutional neural networks(CNNs) to omni-directional images for various visual tasks.However, most of them use image representations defined inthe Euclidean space after transforming the omni-directionalviews originally formed in the non-Euclidean space. Thistransformation leads to shape distortion due to nonuniformspatial resolving power and the loss of continuity. Theseeffects make existing convolution kernels experience diffi-culties in extracting meaningful information. This paper presents a novel method to resolve such prob-lems of applying CNNs to omni-directional images. Theproposed method utilizes a spherical polyhedron to rep-resent omni-directional views. This method minimizes thevariance of the spatial resolving power on the sphere sur-face, and includes new convolution and pooling methodsfor the proposed representation. The proposed method canalso be adopted by any existing CNN-based methods. Thefeasibility of the proposed method is demonstrated throughclassification, detection, and semantic segmentation taskswith synthetic and real datasets.
Tasks	Semantic Segmentation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Lee_SpherePHD_Applying_CNNs_on_a_Spherical_PolyHeDron_Representation_of_360deg_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Lee_SpherePHD_Applying_CNNs_on_a_Spherical_PolyHeDron_Representation_of_360deg_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/spherephd-applying-cnns-on-a-spherical-1
Repo
Framework

Neural Machine Translation: Hindi-Nepali


Title	Neural Machine Translation: Hindi-Nepali
Authors	Sahinur Rahman Laskar, Partha Pakray, B, Sivaji yopadhyay
Abstract	With the extensive use of Machine Translation (MT) technology, there is progressively interest in directly translating between pairs of similar languages. Because the main challenge is to overcome the limitation of available parallel data to produce a precise MT output. Current work relies on the Neural Machine Translation (NMT) with attention mechanism for the similar language translation of WMT19 shared task in the context of Hindi-Nepali pair. The NMT systems trained the Hindi-Nepali parallel corpus and tested, analyzed in Hindi â‡” Nepali translation. The official result declared at WMT19 shared task, which shows that our NMT system obtained Bilingual Evaluation Understudy (BLEU) score 24.6 for primary configuration in Nepali to Hindi translation. Also, we have achieved BLEU score 53.7 (Hindi to Nepali) and 49.1 (Nepali to Hindi) in contrastive system type.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5427/
PDF	https://www.aclweb.org/anthology/W19-5427
PWC	https://paperswithcode.com/paper/neural-machine-translation-hindi-nepali
Repo
Framework

GWU NLP Lab at SemEval-2019 Task 3 : EmoContext: Effectiveness ofContextual Information in Models for Emotion Detection inSentence-level at Multi-genre Corpus


Title	GWU NLP Lab at SemEval-2019 Task 3 : EmoContext: Effectiveness ofContextual Information in Models for Emotion Detection inSentence-level at Multi-genre Corpus
Authors	Shabnam Tafreshi, Mona Diab
Abstract	In this paper we present an emotion classifier models that submitted to the SemEval-2019 Task 3 : \textit{EmoContext}. Our approach is a Gated Recurrent Neural Network (GRU) model with attention layer is bootstrapped with contextual information and trained with a multigenre corpus, which is combination of several popular emotional data sets. We utilize different word embeddings to empirically select the most suited embedding to represent our features. Our aim is to build a robust emotion classifier that can generalize emotion detection, which is to learn emotion cues in a noisy training environment. To fulfill this aim we train our model with a multigenre emotion corpus, this way we leverage from having more training set. We achieved overall {%}56.05 f1-score and placed 144. Given our aim and noisy training environment, the results are anticipated.
Tasks	Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2038/
PDF	https://www.aclweb.org/anthology/S19-2038
PWC	https://paperswithcode.com/paper/gwu-nlp-lab-at-semeval-2019-task-3-emocontext-1
Repo
Framework

EASY-M: Evaluation System for Multilingual Summarizers


Title	EASY-M: Evaluation System for Multilingual Summarizers
Authors
Abstract	Automatic text summarization aims at producing a shorter version of a document (or a document set). Evaluation of summarization quality is a challenging task. Because human evaluations are expensive and evaluators often disagree between themselves, many researchers prefer to evaluate their systems automatically, with help of software tools. Such a tool usually requires a point of reference in the form of one or more human-written summaries for each text in the corpus. Then, a system-generated summary is compared to one or more human-written summaries, according to selected metrics. However, a single metric cannot reflect all quality-related aspects of a summary. In this paper we present the EvAluation SYstem for Multilingual Summarization (EASY-M), which enables the evaluation of system-generated summaries in 17 different languages with several quality measures, based on comparison with their human-generated counterparts. The system also provides comparative results with two built-in baselines. The source code and both online and offline versions of EASY-M is freely available for the NLP community.
Tasks	Text Summarization
Published	2019-09-01
URL	https://www.aclweb.org/anthology/W19-8908/
PDF	https://www.aclweb.org/anthology/W19-8908
PWC	https://paperswithcode.com/paper/easy-m-evaluation-system-for-multilingual
Repo
Framework