January 25, 2020

3134 words 15 mins read

Paper Group NANR 24

Reading Like HER: Human Reading Inspired Extractive Summarization. Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge. Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions. Learning Personalized Modular Network Guided by Structured Knowledge. Typography Wit …

Reading Like HER: Human Reading Inspired Extractive Summarization


Title	Reading Like HER: Human Reading Inspired Extractive Summarization
Authors	Ling Luo, Xiang Ao, Yan Song, Feiyang Pan, Min Yang, Qing He
Abstract	In this work, we re-examine the problem of extractive text summarization for long documents. We observe that the process of extracting summarization of human can be divided into two stages: 1) a rough reading stage to look for sketched information, and 2) a subsequent careful reading stage to select key sentences to form the summary. By simulating such a two-stage process, we propose a novel approach for extractive summarization. We formulate the problem as a contextual-bandit problem and solve it with policy gradient. We adopt a convolutional neural network to encode gist of paragraphs for rough reading, and a decision making policy with an adapted termination mechanism for careful reading. Experiments on the CNN and DailyMail datasets show that our proposed method can provide high-quality summaries with varied length, and significantly outperform the state-of-the-art extractive methods in terms of ROUGE metrics.
Tasks	Decision Making, Text Summarization
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1300/
PDF	https://www.aclweb.org/anthology/D19-1300
PWC	https://paperswithcode.com/paper/reading-like-her-human-reading-inspired
Repo
Framework

Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge


Title	Improving Multi-label Emotion Classification by Integrating both General and Domain-specific Knowledge
Authors	Wenhao Ying, Rong Xiang, Qin Lu
Abstract	Deep learning based general language models have achieved state-of-the-art results in many popular tasks such as sentiment analysis and QA tasks. Text in domains like social media has its own salient characteristics. Domain knowledge should be helpful in domain relevant tasks. In this work, we devise a simple method to obtain domain knowledge and further propose a method to integrate domain knowledge with general knowledge based on deep language models to improve performance of emotion classification. Experiments on Twitter data show that even though a deep language model fine-tuned by a target domain data has attained comparable results to that of previous state-of-the-art models, this fine-tuned model can still benefit from our extracted domain knowledge to obtain more improvement. This highlights the importance of making use of domain knowledge in domain-specific applications.
Tasks	Emotion Classification, Language Modelling, Sentiment Analysis
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5541/
PDF	https://www.aclweb.org/anthology/D19-5541
PWC	https://paperswithcode.com/paper/improving-multi-label-emotion-classification-1
Repo
Framework

Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions


Title	Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions
Authors	Philipp Koehn, Francisco Guzm{'a}n, Vishrav Chaudhary, Juan Pino
Abstract	Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2{%} and 10{%} of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5404/
PDF	https://www.aclweb.org/anthology/W19-5404
PWC	https://paperswithcode.com/paper/findings-of-the-wmt-2019-shared-task-on-1
Repo
Framework

Learning Personalized Modular Network Guided by Structured Knowledge


Title	Learning Personalized Modular Network Guided by Structured Knowledge
Authors	Xiaodan Liang
Abstract	The dominant deep learning approaches use a “one-size-fits-all” paradigm with the hope that underlying characteristics of diverse inputs can be captured via a fixed structure. They also overlook the importance of explicitly modeling feature hierarchy. However, complex real-world tasks often require discovering diverse reasoning paths for different inputs to achieve satisfying predictions, especially for challenging large-scale recognition tasks with complex label relations. In this paper, we treat the structured commonsense knowledge (e.g. concept hierarchy) as the guidance of customizing more powerful and explainable network structures for distinct inputs, leading to dynamic and individualized inference paths. Give an off-the-shelf large network configuration, the proposed Personalized Modular Network (PMN) is learned by selectively activating a sequence of network modules where each of them is designated to recognize particular levels of structured knowledge. Learning semantic configurations and activation of modules to align well with structured knowledge can be regarded as a decision-making procedure, which is solved by a new graph-based reinforcement learning algorithm. Experiments on three semantic segmentation tasks and classification tasks show our PMN can achieve superior performance with the reduced number of network modules while discovering personalized and explainable module configurations for each input.
Tasks	Decision Making, Semantic Segmentation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Liang_Learning_Personalized_Modular_Network_Guided_by_Structured_Knowledge_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Liang_Learning_Personalized_Modular_Network_Guided_by_Structured_Knowledge_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-personalized-modular-network-guided
Repo
Framework

Typography With Decor: Intelligent Text Style Transfer


Title	Typography With Decor: Intelligent Text Style Transfer
Authors	Wenjing Wang, Jiaying Liu, Shuai Yang, Zongming Guo
Abstract	Text effects transfer can dramatically make the text visually pleasing. In this paper, we present a novel framework to stylize the text with exquisite decor, which is ignored by the previous text stylization methods. Decorative elements pose a challenge to spontaneously handle basal text effects and decor, which are two different styles. To address this issue, our key idea is to learn to separate, transfer and recombine the decors and the basal text effect. A novel text effect transfer network is proposed to infer the styled version of the target text. The stylized text is finally embellished with decor where the placement of the decor is carefully determined by a novel structure-aware strategy. Furthermore, we propose a domain adaptation strategy for decor detection and a one-shot training strategy for text effects transfer, which greatly enhance the robustness of our network to new styles. We base our experiments on our collected topography dataset including 59,000 professionally styled text and demonstrate the superiority of our method over other state-of-the-art style transfer methods.
Tasks	Domain Adaptation, Style Transfer, Text Effects Transfer, Text Style Transfer
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Typography_With_Decor_Intelligent_Text_Style_Transfer_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Typography_With_Decor_Intelligent_Text_Style_Transfer_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/typography-with-decor-intelligent-text-style
Repo
Framework

Historical Text Normalization with Delayed Rewards


Title	Historical Text Normalization with Delayed Rewards
Authors	Simon Flachs, Marcel Bollmann, Anders S{\o}gaard
Abstract	Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models. Policy gradient training enables direct optimization for exact matches, and while the small datasets in historical text normalization are prohibitive of from-scratch reinforcement learning, we show that policy gradient fine-tuning leads to significant improvements across the board. Policy gradient training, in particular, leads to more accurate normalizations for long or unseen words.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1157/
PDF	https://www.aclweb.org/anthology/P19-1157
PWC	https://paperswithcode.com/paper/historical-text-normalization-with-delayed
Repo
Framework

Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling


Title	Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling
Authors	Alice Millour, Kar{"e}n Fort
Abstract	Building representative linguistic resources and NLP tools for non-standardized languages is challenging: when spelling is not determined by a norm, multiple written forms can be encountered for a given word, inducing a large proportion of out-of-vocabulary words. To embrace this diversity, we propose a methodology based on crowdsourced alternative spellings we use to extract rules applied to match OOV words with one of their spelling variants. This virtuous process enables the unsupervised augmentation of multi-variant lexicons without expert rule definition. We apply this multilingual methodology on Alsatian, a French regional language and provide an intrinsic evaluation of the correctness of the variants pairs, and an extrinsic evaluation on a downstream task. We show that in a low-resource scenario, 145 inital pairs can lead to the generation of 876 additional variant pairs, and a diminution of OOV words improving the part-of-speech tagging performance by 1 to 4{%}.
Tasks	Data Augmentation, Part-Of-Speech Tagging
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1090/
PDF	https://www.aclweb.org/anthology/R19-1090
PWC	https://paperswithcode.com/paper/unsupervised-data-augmentation-for-less
Repo
Framework

Regret Bounds for Learning State Representations in Reinforcement Learning


Title	Regret Bounds for Learning State Representations in Reinforcement Learning
Authors	Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard
Abstract	We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(sqrt(T)) regret in any communicating Markov decision process. The regret bound shows that UCB-MS automatically adapts to the Markov model. This improves over the currently known best results in the literature that gave regret bounds of order O(T^(2/3)).
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9435-regret-bounds-for-learning-state-representations-in-reinforcement-learning
PDF	http://papers.nips.cc/paper/9435-regret-bounds-for-learning-state-representations-in-reinforcement-learning.pdf
PWC	https://paperswithcode.com/paper/regret-bounds-for-learning-state
Repo
Framework

Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes


Title	Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes
Authors	Yuke Li
Abstract	Path forecasting is a pivotal step toward understanding dynamic scenes and an emerging topic in the computer vi- sion field. This task is challenging due to the multimodal nature of the future, namely, given a partial history, there is more than one plausible prediction. Yet, the state-of-the-art methods seem not fully responsive to this innate variabil- ity. Hence, how to better foresee the forthcoming trajectory in dynamic scenes has to be more thoroughly pursued. To this end, we propose a novel Imitative Decision Learning (IDL) approach. It delves deeper into the key that inher- ently characterizes the multimodality - the latent decision. The proposed IDL first infers the distribution of such latent decisions by learning from moving histories. A policy is then generated by taking the sampled latent decision into account to predict the future. Different plausible upcoming paths corresponds to each sampled latent decision. This ap- proach significantly differs from the mainstream literature that relies on a predefined latent variable to extrapolate di- verse predictions. In order to augment the understanding of the latent decision and resultant mutimodal future, we in- vestigate their connection through mutual information op- timization. Moreover, the proposed IDL integrates spatial and temporal dependencies into one single framework, in contrast to handling them with two-step settings. As a re- sult, our approach enables simultaneous anticipation of the paths of all pedestrians in the scene. We assess our pro- posal on the large-scale SAP, ETH and UCY datasets. The experiments show that IDL introduces considerable margin improvements with respect to recent leading studies.
Tasks
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Li_Which_Way_Are_You_Going_Imitative_Decision_Learning_for_Path_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Which_Way_Are_You_Going_Imitative_Decision_Learning_for_Path_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/which-way-are-you-going-imitative-decision
Repo
Framework

ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION


Title	ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION
Authors	Nuwan Ferdinand, Haider Al-Lawati, Stark Draper, Matthew Nokleby
Abstract	Distributed optimization is vital in solving large-scale machine learning problems. A widely-shared feature of distributed optimization techniques is the requirement that all nodes complete their assigned tasks in each computational epoch before the system can proceed to the next epoch. In such settings, slow nodes, called stragglers, can greatly slow progress. To mitigate the impact of stragglers, we propose an online distributed optimization method called Anytime Minibatch. In this approach, all nodes are given a fixed time to compute the gradients of as many data samples as possible. The result is a variable per-node minibatch size. Workers then get a fixed communication time to average their minibatch gradients via several rounds of consensus, which are then used to update primal variables via dual averaging. Anytime Minibatch prevents stragglers from holding up the system without wasting the work that stragglers can complete. We present a convergence analysis and analyze the wall time performance. Our numerical results show that our approach is up to 1.5 times faster in Amazon EC2 and it is up to five times faster when there is greater variability in compute nodes performance.
Tasks	Distributed Optimization
Published	2019-05-01
URL	https://openreview.net/forum?id=rkzDIiA5YQ
PDF	https://openreview.net/pdf?id=rkzDIiA5YQ
PWC	https://paperswithcode.com/paper/anytime-minibatch-exploiting-stragglers-in
Repo
Framework

Adversarial Removal of Demographic Attributes Revisited


Title	Adversarial Removal of Demographic Attributes Revisited
Authors	Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders S{\o}gaard
Abstract	Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on. We revisit their experiments and conduct a series of follow-up experiments showing that, in fact, the diagnostic classifier generalizes poorly to both new in-domain samples and new domains, indicating that it relies on correlations specific to their particular data sample. We further show that a diagnostic classifier trained on the biased baseline neural network also does not generalize to new samples. In other words, the biases detected in Elazar and Goldberg (2018) seem restricted to their particular data sample, and would therefore not bias the decisions of the model on new samples, whether in-domain or out-of-domain. In light of this, we discuss better methodologies for detecting bias in our models.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1662/
PDF	https://www.aclweb.org/anthology/D19-1662
PWC	https://paperswithcode.com/paper/adversarial-removal-of-demographic-attributes-2
Repo
Framework

Analysis and classification of heart diseases using heartbeat features and machine learning algorithms


Title	Analysis and classification of heart diseases using heartbeat features and machine learning algorithms
Authors	Fajr Ibrahem Alarsan, Mamoon Younes
Abstract	This study proposed an ECG (Electrocardiogram) classification approach using machine learning based on several ECG features. An electrocardiogram (ECG) is a signal that measures the electric activity of the heart. The proposed approach is implemented using ML-libs and Scala language on Apache Spark framework; MLlib is Apache Spark’s scalable machine learning library. The key challenge in ECG classification is to handle the irregularities in the ECG signals which is very important to detect the patient status. Therefore, we have proposed an efficient approach to classify ECG signals with high accuracy Each heartbeat is a combination of action impulse waveforms produced by different specialized cardiac heart tissues. Heartbeats classification faces some difficulties because these waveforms differ from person to another, they are described by some features. These features are the inputs of machine learning algorithm. In general, using Spark–Scala tools simplifies the usage of many algorithms such as machine-learning (ML) algorithms. On other hand, Spark–Scala is preferred to be used more than other tools when size of processing data is too large. In our case, we have used a dataset with 205,146 records to evaluate the performance of our approach. Machine learning libraries in Spark–Scala provide easy ways to implement many classification algorithms (Decision Tree, Random Forests, Gradient-Boosted Trees (GDB), etc.). The proposed method is evaluated and validated on baseline MIT-BIH Arrhythmia and MIT-BIH Supraventricular Arrhythmia database. The results show that our approach achieved an overall accuracy of 96.75% using GDB Tree algorithm and 97.98% using random Forest for binary classification. For multi class classification, it achieved to 98.03% accuracy using Random Forest, Gradient Boosting tree supports only binary classification.
Tasks	ECG Classification, Electrocardiography (ECG), Heartbeat Classification
Published	2019-08-31
URL	https://doi.org/10.1186/s40537-019-0244-x
PDF	https://journalofbigdata.springeropen.com/track/pdf/10.1186/s40537-019-0244-x
PWC	https://paperswithcode.com/paper/analysis-and-classification-of-heart-diseases
Repo
Framework

From Research to Production and Back: Ludicrously Fast Neural Machine Translation


Title	From Research to Production and Back: Ludicrously Fast Neural Machine Translation
Authors	Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev
Abstract	This paper describes the submissions of the {``}Marian{''} team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year{'}s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year{'}s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively. \|
Tasks	Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5632/
PDF	https://www.aclweb.org/anthology/D19-5632
PWC	https://paperswithcode.com/paper/from-research-to-production-and-back
Repo
Framework

Transformer and seq2seq model for Paraphrase Generation


Title	Transformer and seq2seq model for Paraphrase Generation
Authors	Elozino Egonmwan, Yllias Chali
Abstract	Paraphrase generation aims to improve the clarity of a sentence by using different wording that convey similar meaning. For better quality of generated paraphrases, we propose a framework that combines the effectiveness of two models {–} transformer and sequence-to-sequence (seq2seq). We design a two-layer stack of encoders. The first layer is a transformer model containing 6 stacked identical layers with multi-head self attention, while the second-layer is a seq2seq model with gated recurrent units (GRU-RNN). The transformer encoder layer learns to capture long-term dependencies, together with syntactic and semantic properties of the input sentence. This rich vector representation learned by the transformer serves as input to the GRU-RNN encoder responsible for producing the state vector for decoding. Experimental results on two datasets-QUORA and MSCOCO using our framework, produces a new benchmark for paraphrase generation.
Tasks	Paraphrase Generation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5627/
PDF	https://www.aclweb.org/anthology/D19-5627
PWC	https://paperswithcode.com/paper/transformer-and-seq2seq-model-for-paraphrase
Repo
Framework

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection


Title	Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection
Authors	Hang Xu, Chenhan Jiang, Xiaodan Liang, Liang Lin, Zhenguo Li
Abstract	In this paper, we address the large-scale object detection problem with thousands of categories, which poses severe challenges due to long-tail data distributions, heavy occlusions, and class ambiguities. However, the dominant object detection paradigm is limited by treating each object region separately without considering crucial semantic dependencies among objects. In this work, we introduce a novel Reasoning-RCNN to endow any detection networks the capability of adaptive global reasoning over all object regions by exploiting diverse human commonsense knowledge. Instead of only propagating the visual features on the image directly, we evolve the high-level semantic representations of all categories globally to avoid distracted or poor visual features in the image. Specifically, built on feature representations of basic detection network, the proposed network first generates a global semantic pool by collecting the weights of previous classification layer for each category, and then adaptively enhances each object features via attending different semantic contexts in the global semantic pool. Rather than propagating information from all semantic information that may be noisy, our adaptive global reasoning automatically discovers most relative categories for feature evolving. Our Reasoning-RCNN is light-weight and flexible enough to enhance any detection backbone networks, and extensible for integrating any knowledge resources. Solid experiments on object detection benchmarks show the superiority of our Reasoning-RCNN, e.g. achieving around 16% improvement on VisualGenome, 37% on ADE in terms of mAP and 15% improvement on COCO.
Tasks	Object Detection
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Xu_Reasoning-RCNN_Unifying_Adaptive_Global_Reasoning_Into_Large-Scale_Object_Detection_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Xu_Reasoning-RCNN_Unifying_Adaptive_Global_Reasoning_Into_Large-Scale_Object_Detection_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/reasoning-rcnn-unifying-adaptive-global
Repo
Framework