Paper Group NANR 206
Distributionally Robust Optimization Leads to Better Generalization: on SGD and Beyond. Towards Comprehensive Description Generation from Factual Attribute-value Tables. Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks. BLCU-NLP at COIN-Shared Task1: Stagew …
Distributionally Robust Optimization Leads to Better Generalization: on SGD and Beyond
Title | Distributionally Robust Optimization Leads to Better Generalization: on SGD and Beyond |
Authors | Jikai Hou, Kaixuan Huang, Zhihua Zhang |
Abstract | In this paper, we adopt distributionally robust optimization (DRO) (Ben-Tal et al., 2013) in hope to achieve a better generalization in deep learning tasks. We establish the generalization guarantees and analyze the localized Rademacher complexity for DRO, and conduct experiments to show that DRO obtains a better performance. We reveal the profound connection between SGD and DRO, i.e., selecting a batch can be viewed as choosing a distribution over the training set. From this perspective, we prove that SGD is prone to escape from bad stationary points and small batch SGD outperforms large batch SGD. We give an upper bound for the robust loss when SGD converges and keeps stable. We propose a novel Weighted SGD (WSGD) algorithm framework, which assigns high-variance weights to the data of the current batch. We devise a practical implement of WSGD that can directly optimize the robust loss. We test our algorithm on CIFAR-10 and CIFAR-100, and WSGD achieves significant improvements over the conventional SGD. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S1fDssA5Y7 |
https://openreview.net/pdf?id=S1fDssA5Y7 | |
PWC | https://paperswithcode.com/paper/distributionally-robust-optimization-leads-to |
Repo | |
Framework | |
Towards Comprehensive Description Generation from Factual Attribute-value Tables
Title | Towards Comprehensive Description Generation from Factual Attribute-value Tables |
Authors | Tianyu Liu, Fuli Luo, Pengcheng Yang, Wei Wu, Baobao Chang, Zhifang Sui |
Abstract | The comprehensive descriptions for factual attribute-value tables, which should be accurate, informative and loyal, can be very helpful for end users to understand the structured data in this form. However previous neural generators might suffer from key attributes missing, less informative and groundless information problems, which impede the generation of high-quality comprehensive descriptions for tables. To relieve these problems, we first propose force attention (FA) method to encourage the generator to pay more attention to the uncovered attributes to avoid potential key attributes missing. Furthermore, we propose reinforcement learning for information richness to generate more informative as well as more loyal descriptions for tables. In our experiments, we utilize the widely used WIKIBIO dataset as a benchmark. Besides, we create WB-filter based on WIKIBIO to test our model in the simulated user-oriented scenarios, in which the generated descriptions should accord with particular user interests. Experimental results show that our model outperforms the state-of-the-art baselines on both automatic and human evaluation. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1600/ |
https://www.aclweb.org/anthology/P19-1600 | |
PWC | https://paperswithcode.com/paper/towards-comprehensive-description-generation |
Repo | |
Framework | |
Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks
Title | Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks |
Authors | Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, Guotong Xie |
Abstract | To solve the shared tasks of COIN: COmmonsense INference in Natural Language Processing) Workshop in , we need explore the impact of knowledge representation in modeling commonsense knowledge to boost performance of machine reading comprehension beyond simple text matching. There are two approaches to represent knowledge in the low-dimensional space. The first is to leverage large-scale unsupervised text corpus to train fixed or contextual language representations. The second approach is to explicitly express knowledge into a knowledge graph (KG), and then fit a model to represent the facts in the KG. We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention. We find out that: (a) for task 1, first fine-tuning on larger datasets like RACE (Lai et al., 2017) and SWAG (Zellersetal.,2018), and then fine-tuning on the target task improve the performance significantly; (b) for task 2, we find out the incorporating a KG of commonsense knowledge, WordNet (Miller, 1995) into the Bert model (Devlin et al., 2018) is helpful, however, it will hurts the performace of XLNET (Yangetal.,2019), a more powerful pre-trained model. Our approaches achieve the state-of-the-art results on both shared task{'}s official test data, outperforming all the other submissions. |
Tasks | Common Sense Reasoning, Machine Reading Comprehension, Reading Comprehension, Text Matching |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-6011/ |
https://www.aclweb.org/anthology/D19-6011 | |
PWC | https://paperswithcode.com/paper/pingan-smart-health-and-sjtu-at-coin-shared |
Repo | |
Framework | |
BLCU-NLP at COIN-Shared Task1: Stagewise Fine-tuning BERT for Commonsense Inference in Everyday Narrations
Title | BLCU-NLP at COIN-Shared Task1: Stagewise Fine-tuning BERT for Commonsense Inference in Everyday Narrations |
Authors | Chunhua Liu, Dong Yu |
Abstract | This paper describes our system for COIN Shared Task 1: Commonsense Inference in Everyday Narrations. To inject more external knowledge to better reason over the narrative passage, question and answer, the system adopts a stagewise fine-tuning method based on pre-trained BERT model. More specifically, the first stage is to fine-tune on addi- tional machine reading comprehension dataset to learn more commonsense knowledge. The second stage is to fine-tune on target-task (MCScript2.0) with MCScript (2018) dataset assisted. Experimental results show that our system achieves significant improvements over the baseline systems with 84.2{%} accuracy on the official test dataset. |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-6012/ |
https://www.aclweb.org/anthology/D19-6012 | |
PWC | https://paperswithcode.com/paper/blcu-nlp-at-coin-shared-task1-stagewise-fine |
Repo | |
Framework | |
A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems
Title | A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems |
Authors | Elham J. Barezi, Ian D. Wood, Pascale Fung, Hamid R. Rabiee |
Abstract | Extreme classification is a classification task on an extremely large number of labels (tags). User generated labels for any type of online data can be sparing per individual user but intractably large among all users. It would be useful to automatically select a smaller, standard set of labels to represent the whole label set. We can then solve efficiently the problem of multi-label learning with an intractably large number of interdependent labels, such as automatic tagging of Wikipedia pages. We propose a submodular maximization framework with linear cost to find informative labels which are most relevant to other labels yet least redundant with each other. A simple prediction model can then be trained on this label subset. Our framework includes both label-label and label-feature dependencies, which aims to find the labels with the most representation and prediction ability. In addition, to avoid information loss, we extract and predict outlier labels with weak dependency on other labels. We apply our model to four standard natural language data sets including Bibsonomy entries with users assigned tags, web pages with user assigned tags, legal texts with EUROVOC descriptors(A topic hierarchy with almost 4000 categories regarding different aspects of European law) and Wikipedia pages with tags from social bookmarking as well as news videos for automated label detection from a lexicon of semantic concepts. Experimental results show that our proposed approach improves label prediction quality, in terms of precision and nDCG, by 3{%} to 5{%} in three of the 5 tasks and is competitive in the others, even with a simple linear prediction model. An ablation study shows how different data sets benefit from different aspects of our model, with all aspects contributing substantially to at least one data set. |
Tasks | Multi-Label Learning |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1106/ |
https://www.aclweb.org/anthology/N19-1106 | |
PWC | https://paperswithcode.com/paper/a-submodular-feature-aware-framework-for |
Repo | |
Framework | |
Equipping Educational Applications with Domain Knowledge
Title | Equipping Educational Applications with Domain Knowledge |
Authors | Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, Suma Bhat |
Abstract | One of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e.g., history or science). To address this challenge, we propose a tool, Dexter, that extracts a subject-specific corpus from a heterogeneous corpus, such as Wikipedia, by relying on a small seed corpus and distributed document representations. We empirically show the impact of the generated corpus on language modeling, estimating word embeddings, and consequently, distractor generation, resulting in better performances than while using a general domain corpus, a heuristically constructed domain-specific corpus, and a corpus generated by a popular system: BootCaT. |
Tasks | Language Modelling, Word Embeddings |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4448/ |
https://www.aclweb.org/anthology/W19-4448 | |
PWC | https://paperswithcode.com/paper/equipping-educational-applications-with |
Repo | |
Framework | |
Controlling Contents in Data-to-Document Generation with Human-Designed Topic Labels
Title | Controlling Contents in Data-to-Document Generation with Human-Designed Topic Labels |
Authors | Kasumi Aoki, Akira Miyazawa, Tatsuya Ishigaki, Tatsuya Aoki, Hiroshi Noji, Keiichi Goshima, Ichiro Kobayashi, Hiroya Takamura, Yusuke Miyao |
Abstract | We propose a data-to-document generator that can easily control the contents of output texts based on a neural language model. Conventional data-to-text model is useful when a reader seeks a global summary of data because it has only to describe an important part that has been extracted beforehand. However, because depending on users, it differs what they are interested in, so it is necessary to develop a method to generate various summaries according to users{'} interests. We develop a model to generate various summaries and to control their contents by providing the explicit targets for a reference to the model as controllable factors. In the experiments, we used five-minute or one-hour charts of 9 indicators (e.g., Nikkei225), as time-series data, and daily summaries of Nikkei Quick News as textual data. We conducted comparative experiments using two pieces of information: human-designed topic labels indicating the contents of a sentence and automatically extracted keywords as the referential information for generation. |
Tasks | Language Modelling, Time Series |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-8640/ |
https://www.aclweb.org/anthology/W19-8640 | |
PWC | https://paperswithcode.com/paper/controlling-contents-in-data-to-document |
Repo | |
Framework | |
Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019
Title | Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019 |
Authors | Atul Kr. Ojha, Ritesh Kumar, Akanksha Bansal, Priya Rani |
Abstract | The present paper enumerates the development of Panlingua-KMI Machine Translation (MT) systems for Hindi ↔ Nepali language pair, designed as part of the Similar Language Translation Task at the WMT 2019 Shared Task. The Panlingua-KMI team conducted a series of experiments to explore both the phrase-based statistical (PBSMT) and neural methods (NMT). Among the 11 MT systems prepared under this task, 6 PBSMT systems were prepared for Nepali-Hindi, 1 PBSMT for Hindi-Nepali and 2 NMT systems were developed for Nepali↔Hindi. The results show that PBSMT could be an effective method for developing MT systems for closely-related languages. Our Hindi-Nepali PBSMT system was ranked 2nd among the 13 systems submitted for the pair and our Nepali-Hindi PBSMTsystem was ranked 4th among the 12 systems submitted for the task. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5429/ |
https://www.aclweb.org/anthology/W19-5429 | |
PWC | https://paperswithcode.com/paper/panlingua-kmi-mt-system-for-similar-language |
Repo | |
Framework | |
Where and when to look? Spatial-temporal attention for action recognition in videos
Title | Where and when to look? Spatial-temporal attention for action recognition in videos |
Authors | Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Frederick Tung, Leonid Sigal |
Abstract | Inspired by the observation that humans are able to process videos efficiently by only paying attention when and where it is needed, we propose a novel spatial-temporal attention mechanism for video-based action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a soft temporal attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers that ensure that our attention mechanism attends to coherent regions in space and time. Our model is efficient, as it proposes a separable spatio-temporal mechanism for video attention, while being able to identify important parts of the video both spatially and temporally. We demonstrate the efficacy of our approach on three public video action recognition datasets. The proposed approach leads to state-of-the-art performance on all of them, including the new large-scale Moments in Time dataset. Furthermore, we quantitatively and qualitatively evaluate our model’s ability to accurately localize discriminative regions spatially and critical frames temporally. This is despite our model only being trained with per video classification labels. |
Tasks | Action Recognition In Videos, Temporal Action Localization, Video Classification |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BkesJ3R9YX |
https://openreview.net/pdf?id=BkesJ3R9YX | |
PWC | https://paperswithcode.com/paper/where-and-when-to-look-spatial-temporal |
Repo | |
Framework | |
CoAStaL at SemEval-2019 Task 3: Affect Classification in Dialogue using Attentive BiLSTMs
Title | CoAStaL at SemEval-2019 Task 3: Affect Classification in Dialogue using Attentive BiLSTMs |
Authors | Ana Valeria Gonz{'a}lez, Victor Petr{'e}n Bach Hansen, Joachim Bingel, Anders S{\o}gaard |
Abstract | This work describes the system presented by the CoAStaL Natural Language Processing group at University of Copenhagen. The main system we present uses the same attention mechanism presented in (Yang et al., 2016). Our overall model architecture is also inspired by their hierarchical classification model and adapted to deal with classification in dialogue by encoding information at the turn level. We use different encodings for each turn to create a more expressive representation of dialogue context which is then fed into our classifier.We also define a custom preprocessing step in order to deal with language commonly used in interactions across many social media outlets. Our proposed system achieves a micro F1 score of 0.7340 on the test set and shows significant gains in performance compared to a system using dialogue level encoding. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2026/ |
https://www.aclweb.org/anthology/S19-2026 | |
PWC | https://paperswithcode.com/paper/coastal-at-semeval-2019-task-3-affect |
Repo | |
Framework | |
Predicted Variables in Programming
Title | Predicted Variables in Programming |
Authors | Victor Carbune, Thierry Coppey, Alexander Daryin, Thomas Deselaers, Nikhil Sarda, Jay Yagnik |
Abstract | We present Predicted Variables, an approach to making machine learning (ML) a first class citizen in programming languages. There is a growing divide in approaches to building systems: using human experts (e.g. programming) on the one hand, and using behavior learned from data (e.g. ML) on the other hand. PVars aim to make using ML in programming easier by hybridizing the two. We leverage the existing concept of variables and create a new type, a predicted variable. PVars are akin to native variables with one important distinction: PVars determine their value using ML when evaluated. We describe PVars and their interface, how they can be used in programming, and demonstrate the feasibility of our approach on three algorithmic problems: binary search, QuickSort, and caches. We show experimentally that PVars are able to improve over the commonly used heuristics and lead to a better performance than the original algorithms. As opposed to previous work applying ML to algorithmic problems, PVars have the advantage that they can be used within the existing frameworks and do not require the existing domain knowledge to be replaced. PVars allow for a seamless integration of ML into existing systems and algorithms. Our PVars implementation currently relies on standard Reinforcement Learning (RL) methods. To learn faster, PVars use the heuristic function, which they are replacing, as an initial function. We show that PVars quickly pick up the behavior of the initial function and then improve performance beyond that without ever performing substantially worse – allowing for a safe deployment in critical applications. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=B1epooR5FX |
https://openreview.net/pdf?id=B1epooR5FX | |
PWC | https://paperswithcode.com/paper/predicted-variables-in-programming |
Repo | |
Framework | |
MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)
Title | MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge) |
Authors | Dhaou Ghoul, Ga{"e}l Lejeune |
Abstract | We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25{%} with 1{\textless}=N{\textless}=3 but showed a much better result with character 4-grams (62.17{%} accuracy). |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4627/ |
https://www.aclweb.org/anthology/W19-4627 | |
PWC | https://paperswithcode.com/paper/michael-mining-character-level-patterns-for |
Repo | |
Framework | |
Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging
Title | Cross-lingual Multi-Level Adversarial Transfer to Enhance Low-Resource Name Tagging |
Authors | Lifu Huang, Heng Ji, Jonathan May |
Abstract | We focus on improving name tagging for low-resource languages using annotations from related languages. Previous studies either directly project annotations from a source language to a target language using cross-lingual representations or use a shared encoder in a multitask network to transfer knowledge. These approaches inevitably introduce noise to the target language annotation due to mismatched source-target sentence structures. To effectively transfer the resources, we develop a new neural architecture that leverages multi-level adversarial transfer: (1) word-level adversarial training, which projects source language words into the same semantic space as those of the target language without using any parallel corpora or bilingual gazetteers, and (2) sentence-level adversarial training, which yields language-agnostic sequential features. Our neural architecture outperforms previous approaches on CoNLL data sets. Moreover, on 10 low-resource languages, our approach achieves up to 16{%} absolute F-score gain over all high-performing baselines on cross-lingual transfer without using any target-language resources. |
Tasks | Cross-Lingual Transfer |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1383/ |
https://www.aclweb.org/anthology/N19-1383 | |
PWC | https://paperswithcode.com/paper/cross-lingual-multi-level-adversarial |
Repo | |
Framework | |
LAP-Net: Level-Aware Progressive Network for Image Dehazing
Title | LAP-Net: Level-Aware Progressive Network for Image Dehazing |
Authors | Yunan Li, Qiguang Miao, Wanli Ouyang, Zhenxin Ma, Huijuan Fang, Chao Dong, Yining Quan |
Abstract | In this paper, we propose a level-aware progressive network (LAP-Net) for single image dehazing. Unlike previous multi-stage algorithms that generally learn in a coarse-to-fine fashion, each stage of LAP-Net learns different levels of haze with different supervision. Then the network can progressively learn the gradually aggravating haze. With this design, each stage can focus on a region with specific haze level and restore clear details. To effectively fuse the results of varying haze levels at different stages, we develop an adaptive integration strategy to yield the final dehazed image. This strategy is achieved by a hierarchical integration scheme, which is in cooperation with the memory network and the domain knowledge of dehazing to highlight the best-restored regions of each stage. Extensive experiments on both real-world images and two dehazing benchmarks validate the effectiveness of our proposed method. |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Li_LAP-Net_Level-Aware_Progressive_Network_for_Image_Dehazing_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Li_LAP-Net_Level-Aware_Progressive_Network_for_Image_Dehazing_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/lap-net-level-aware-progressive-network-for |
Repo | |
Framework | |
Person Retrieval in Surveillance Video using Height, Color and Gender
Title | Person Retrieval in Surveillance Video using Height, Color and Gender |
Authors | Hiren Galiyawala, Kenil Shah, Vandit Gajjar, Mehul S. Raval |
Abstract | A person is commonly described by attributes like height, build, cloth color, cloth type, and gender. Such attributes are known as soft biometrics. They bridge the semantic gap between human description and person retrieval in surveillance video. The paper proposes a deep learning-based linear filtering approach for person retrieval using height, cloth color, and gender. The proposed approach uses Mask R-CNN for pixel-wise person segmentation. It removes background clutter and provides precise boundary around the person. Color and gender models are fine-tuned using AlexNet and the algorithm is tested on SoftBioSearch dataset. It achieves good accuracy for person retrieval using the semantic query in challenging conditions. |
Tasks | Person Retrieval |
Published | 2019-02-14 |
URL | https://ieeexplore.ieee.org/document/8639145 |
https://ieeexplore.ieee.org/document/8639145 | |
PWC | https://paperswithcode.com/paper/person-retrieval-in-surveillance-video-using |
Repo | |
Framework | |