Paper Group AWR 355
Adapting Multilingual Neural Machine Translation to Unseen Languages. Adversarial Policy Gradient for Deep Learning Image Augmentation. TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks. Story Ending Prediction by Transferable BERT. Learning Optimal Data Augmentation Policies via Bayesia …
Adapting Multilingual Neural Machine Translation to Unseen Languages
Title | Adapting Multilingual Neural Machine Translation to Unseen Languages |
Authors | Surafel M. Lakew, Alina Karakanta, Marcello Federico, Matteo Negri, Marco Turchi |
Abstract | Multilingual Neural Machine Translation (MNMT) for low-resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to an unseen LRL using data selection and model adaptation. In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance. We extensively explore data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions (LRL-en). We further show that dynamic adaptation of the model’s vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation. Experiments show reductions in training time and significant performance gains over LRL baselines, even with zero LRL data (+13.0 BLEU), up to +17.0 BLEU for pre-trained multilingual model dynamic adaptation with related data selection. Our method outperforms current approaches, such as massively multilingual models and data augmentation, on four LRL. |
Tasks | Data Augmentation, Machine Translation |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13998v1 |
https://arxiv.org/pdf/1910.13998v1.pdf | |
PWC | https://paperswithcode.com/paper/adapting-multilingual-neural-machine |
Repo | https://github.com/surafelml/adapt-mnmt |
Framework | none |
Adversarial Policy Gradient for Deep Learning Image Augmentation
Title | Adversarial Policy Gradient for Deep Learning Image Augmentation |
Authors | Kaiyang Cheng, Claudia Iriondo, Francesco Calivá, Justin Krogue, Sharmila Majumdar, Valentina Pedoia |
Abstract | The use of semantic segmentation for masking and cropping input images has proven to be a significant aid in medical imaging classification tasks by decreasing the noise and variance of the training dataset. However, implementing this approach with classical methods is challenging: the cost of obtaining a dense segmentation is high, and the precise input area that is most crucial to the classification task is difficult to determine a-priori. We propose a novel joint-training deep reinforcement learning framework for image augmentation. A segmentation network, weakly supervised with policy gradient optimization, acts as an agent, and outputs masks as actions given samples as states, with the goal of maximizing reward signals from the classification network. In this way, the segmentation network learns to mask unimportant imaging features. Our method, Adversarial Policy Gradient Augmentation (APGA), shows promising results on Stanford’s MURA dataset and on a hip fracture classification task with an increase in global accuracy of up to 7.33% and improved performance over baseline methods in 9/10 tasks evaluated. We discuss the broad applicability of our joint training strategy to a variety of medical imaging tasks. |
Tasks | Image Augmentation, Semantic Segmentation |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04108v1 |
https://arxiv.org/pdf/1909.04108v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-policy-gradient-for-deep-learning |
Repo | https://github.com/victorychain/Adversarial-Policy-Gradient-Augmentation |
Framework | pytorch |
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks
Title | TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks |
Authors | Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David Konopnicki |
Abstract | Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01351v2 |
https://arxiv.org/pdf/1906.01351v2.pdf | |
PWC | https://paperswithcode.com/paper/talksumm-a-dataset-and-scalable-annotation |
Repo | https://github.com/levguy/talksumm |
Framework | none |
Story Ending Prediction by Transferable BERT
Title | Story Ending Prediction by Transferable BERT |
Authors | Zhongyang Li, Xiao Ding, Ting Liu |
Abstract | Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks. Error analysis shows what are the strength and weakness of BERT-based models for story ending prediction. |
Tasks | Language Modelling, Natural Language Inference, Sentiment Analysis |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07504v2 |
https://arxiv.org/pdf/1905.07504v2.pdf | |
PWC | https://paperswithcode.com/paper/story-ending-prediction-by-transferable-bert |
Repo | https://github.com/eecrazy/TransBERT_ijcai2019 |
Framework | pytorch |
Learning Optimal Data Augmentation Policies via Bayesian Optimization for Image Classification Tasks
Title | Learning Optimal Data Augmentation Policies via Bayesian Optimization for Image Classification Tasks |
Authors | Chunxu Zhang, Jiaxu Cui, Bo Yang |
Abstract | In recent years, deep learning has achieved remarkable achievements in many fields, including computer vision, natural language processing, speech recognition and others. Adequate training data is the key to ensure the effectiveness of the deep models. However, obtaining valid data requires a lot of time and labor resources. Data augmentation (DA) is an effective alternative approach, which can generate new labeled data based on existing data using label-preserving transformations. Although we can benefit a lot from DA, designing appropriate DA policies requires a lot of expert experience and time consumption, and the evaluation of searching the optimal policies is costly. So we raise a new question in this paper: how to achieve automated data augmentation at as low cost as possible? We propose a method named BO-Aug for automating the process by finding the optimal DA policies using the Bayesian optimization approach. Our method can find the optimal policies at a relatively low search cost, and the searched policies based on a specific dataset are transferable across different neural network architectures or even different datasets. We validate the BO-Aug on three widely used image classification datasets, including CIFAR-10, CIFAR-100 and SVHN. Experimental results show that the proposed method can achieve state-of-the-art or near advanced classification accuracy. Code to reproduce our experiments is available at https://github.com/zhangxiaozao/BO-Aug. |
Tasks | Data Augmentation, Image Augmentation, Image Classification, Speech Recognition |
Published | 2019-05-06 |
URL | https://arxiv.org/abs/1905.02610v2 |
https://arxiv.org/pdf/1905.02610v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-optimal-data-augmentation-policies |
Repo | https://github.com/zhangxiaozao/BO-Aug |
Framework | tf |
Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing
Title | Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing |
Authors | Philip May |
Abstract | Image augmentation is a widely used technique to improve the performance of convolutional neural networks (CNNs). In common image shifting, cropping, flipping, shearing and rotating are used for augmentation. But there are more advanced techniques like Cutout and SamplePairing. In this work we present two improvements of the state-of-the-art Cutout and SamplePairing techniques. Our new method called Copyout takes a square patch of another random training image and copies it onto a random location of each image used for training. The second technique we discovered is called CopyPairing. It combines Copyout and SamplePairing for further augmentation and even better performance. We apply different experiments with these augmentation techniques on the CIFAR-10 dataset to evaluate and compare them under different configurations. In our experiments we show that Copyout reduces the test error rate by 8.18% compared with Cutout and 4.27% compared with SamplePairing. CopyPairing reduces the test error rate by 11.97% compared with Cutout and 8.21% compared with SamplePairing. Copyout and CopyPairing implementations are available at https://github.com/t-systems-on-site-services-gmbh/coocop. |
Tasks | Image Augmentation |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00390v2 |
https://arxiv.org/pdf/1909.00390v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-image-augmentation-for-convolutional |
Repo | https://github.com/t-systems-on-site-services-gmbh/coocop |
Framework | none |
Augmented Memory for Correlation Filters in Real-Time UAV Tracking
Title | Augmented Memory for Correlation Filters in Real-Time UAV Tracking |
Authors | Yiming Li, Changhong Fu, Fangqiang Ding, Ziyuan Huang, Jia Pan |
Abstract | The outstanding computational efficiency of discriminative correlation filter (DCF) fades away with various complicated improvements. Previous appearances are also gradually forgotten due to the exponential decay of historical views in traditional appearance updating scheme of DCF framework, reducing the model’s robustness. In this work, a novel tracker based on DCF framework is proposed to augment memory of previously appeared views while running at real-time speed. Several historical views and the current view are simultaneously introduced in training to allow the tracker to adapt to new appearances as well as memorize previous ones. A novel rapid compressed context learning is proposed to increase the discriminative ability of the filter efficiently. Substantial experiments on UAVDT and UAV123 datasets have validated that the proposed tracker performs competitively against other 26 top DCF and deep-based trackers with over 40 FPS on CPU. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.10989v1 |
https://arxiv.org/pdf/1909.10989v1.pdf | |
PWC | https://paperswithcode.com/paper/augmented-memory-for-correlation-filters-in |
Repo | https://github.com/vision4robotics/AMCF-tracker |
Framework | none |
EDVR: Video Restoration with Enhanced Deformable Convolutional Networks
Title | EDVR: Video Restoration with Enhanced Deformable Convolutional Networks |
Authors | Xintao Wang, Kelvin C. K. Chan, Ke Yu, Chao Dong, Chen Change Loy |
Abstract | Video restoration tasks, including super-resolution, deblurring, etc, are drawing increasing attention in the computer vision community. A challenging benchmark named REDS is released in the NTIRE19 Challenge. This new benchmark challenges existing methods from two aspects: (1) how to align multiple frames given large motions, and (2) how to effectively fuse different frames with diverse motion and blur. In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges. First, to handle large motions, we devise a Pyramid, Cascading and Deformable (PCD) alignment module, in which frame alignment is done at the feature level using deformable convolutions in a coarse-to-fine manner. Second, we propose a Temporal and Spatial Attention (TSA) fusion module, in which attention is applied both temporally and spatially, so as to emphasize important features for subsequent restoration. Thanks to these modules, our EDVR wins the champions and outperforms the second place by a large margin in all four tracks in the NTIRE19 video restoration and enhancement challenges. EDVR also demonstrates superior performance to state-of-the-art published methods on video super-resolution and deblurring. The code is available at https://github.com/xinntao/EDVR. |
Tasks | Deblurring, Super-Resolution, Video Super-Resolution |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02716v1 |
https://arxiv.org/pdf/1905.02716v1.pdf | |
PWC | https://paperswithcode.com/paper/edvr-video-restoration-with-enhanced |
Repo | https://github.com/xinntao/EDVR |
Framework | pytorch |
How to Evaluate Word Representations of Informal Domain?
Title | How to Evaluate Word Representations of Informal Domain? |
Authors | Yekun Chai, Naomi Saphra, Adam Lopez |
Abstract | Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications. Nevertheless, how to efficiently evaluate such word embeddings in the informal domain such as Twitter or forums, remains an ongoing challenge due to the lack of sufficient evaluation dataset. We derived a large list of variant spelling pairs from UrbanDictionary with the automatic approaches of weakly-supervised pattern-based bootstrapping and self-training linear-chain conditional random field (CRF). With these extracted relation pairs we promote the odds of eliding the text normalization procedure of traditional NLP pipelines and directly adopting representations of non-standard words in the informal domain. Our code is available. |
Tasks | Word Embeddings |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04669v2 |
https://arxiv.org/pdf/1911.04669v2.pdf | |
PWC | https://paperswithcode.com/paper/how-to-evaluate-word-representations-of |
Repo | https://github.com/cyk1337/UrbanDict |
Framework | none |
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning
Title | KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning |
Authors | Bill Yuchen Lin, Xinyue Chen, Jamin Chen, Xiang Ren |
Abstract | Commonsense reasoning aims to empower machines with the human ability to make presumptions about ordinary situations in our daily life. In this paper, we propose a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences. The framework first grounds a question-answer pair from the semantic space to the knowledge-based symbolic space as a schema graph, a related sub-graph of external knowledge graphs. It represents schema graphs with a novel knowledge-aware graph network module named KagNet, and finally scores answers with graph representations. Our model is based on graph convolutional networks and LSTMs, with a hierarchical path-based attention mechanism. The intermediate attention scores make it transparent and interpretable, which thus produce trustworthy inferences. Using ConceptNet as the only external resource for Bert-based models, we achieved state-of-the-art performance on the CommonsenseQA, a large-scale dataset for commonsense reasoning. |
Tasks | Common Sense Reasoning, Knowledge Base Question Answering, Knowledge Graphs, Natural Language Inference |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.02151v1 |
https://arxiv.org/pdf/1909.02151v1.pdf | |
PWC | https://paperswithcode.com/paper/kagnet-knowledge-aware-graph-networks-for |
Repo | https://github.com/INK-USC/KagNet |
Framework | pytorch |
Mining Objects: Fully Unsupervised Object Discovery and Localization From a Single Image
Title | Mining Objects: Fully Unsupervised Object Discovery and Localization From a Single Image |
Authors | Runsheng Zhang, Yaping Huang, Mengyang Pu, Jian Zhang, Qingji Guan, Qi Zou, Haibin Ling |
Abstract | The goal of our work is to discover dominant objects without using any annotations. We focus on performing unsupervised object discovery and localization in a very general setting where only a single image is given. This is far more challenge than typical co-localization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Location Mining (OLM), which exploits the advantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs). Specifically, we first convert the feature maps from a pre-trained CNN model into a set of transactions, and then discovers frequent patterns from transaction database through pattern mining techniques. We observe that those discovered patterns, i.e, co-occurrence highlighted regions, typically hold appearance and spatial consistency. Motivated by this observation, we can easily discover and localize possible objects by merging relevant meaningful patterns in an unsupervised manner. Extensive experiments on eleven benchmarks demonstrate that OLM achieves competitive localization performance compared with the state-of-the-art methods. We also evaluate our approach compared with unsupervised saliency detection methods and achieves best results on four benchmark datasets. Moreover, we conduct experiments on fine-grained classification to show that our proposed method can locate the entire object and parts accurately, which can benefit to improving the classification results significantly. |
Tasks | Saliency Detection |
Published | 2019-02-26 |
URL | https://arxiv.org/abs/1902.09968v2 |
https://arxiv.org/pdf/1902.09968v2.pdf | |
PWC | https://paperswithcode.com/paper/mining-objects-fully-unsupervised-object |
Repo | https://github.com/anandhupvr/Mining-Objects |
Framework | tf |
End-to-end Learning for GMI Optimized Geometric Constellation Shape
Title | End-to-end Learning for GMI Optimized Geometric Constellation Shape |
Authors | Rasmus T. Jones, Metodi P. Yankov, Darko Zibar |
Abstract | Autoencoder-based geometric shaping is proposed that includes optimizing bit mappings. Up to 0.2 bits/QAM symbol gain in GMI is achieved for a variety of data rates and in the presence of transceiver impairments. The gains can be harvested with standard binary FEC at no cost w.r.t. conventional BICM. |
Tasks | |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08535v1 |
https://arxiv.org/pdf/1907.08535v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-for-gmi-optimized |
Repo | https://github.com/Rassibassi/claude |
Framework | tf |
A Discrete Hard EM Approach for Weakly Supervised Question Answering
Title | A Discrete Hard EM Approach for Weakly Supervised Question Answering |
Authors | Sewon Min, Danqi Chen, Hannaneh Hajishirzi, Luke Zettlemoyer |
Abstract | Many question answering (QA) tasks only provide weak supervision for how the answer should be computed. For example, TriviaQA answers are entities that can be mentioned multiple times in supporting documents, while DROP answers can be computed by deriving many different equations from numbers in the reference text. In this paper, we show it is possible to convert such tasks into discrete latent variable learning problems with a precomputed, task-specific set of possible “solutions” (e.g. different mentions or equations) that contains one correct option. We then develop a hard EM learning scheme that computes gradients relative to the most likely solution at each update. Despite its simplicity, we show that this approach significantly outperforms previous methods on six QA tasks, including absolute gains of 2–10%, and achieves the state-of-the-art on five of them. Using hard updates instead of maximizing marginal likelihood is key to these results as it encourages the model to find the one correct answer, which we show through detailed qualitative analysis. |
Tasks | Question Answering |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04849v1 |
https://arxiv.org/pdf/1909.04849v1.pdf | |
PWC | https://paperswithcode.com/paper/a-discrete-hard-em-approach-for-weakly |
Repo | https://github.com/shmsw25/qa-hard-em |
Framework | pytorch |
Simulating Emergent Properties of Human Driving Behavior Using Multi-Agent Reward Augmented Imitation Learning
Title | Simulating Emergent Properties of Human Driving Behavior Using Multi-Agent Reward Augmented Imitation Learning |
Authors | Raunak P. Bhattacharyya, Derek J. Phillips, Changliu Liu, Jayesh K. Gupta, Katherine Driggs-Campbell, Mykel J. Kochenderfer |
Abstract | Recent developments in multi-agent imitation learning have shown promising results for modeling the behavior of human drivers. However, it is challenging to capture emergent traffic behaviors that are observed in real-world datasets. Such behaviors arise due to the many local interactions between agents that are not commonly accounted for in imitation learning. This paper proposes Reward Augmented Imitation Learning (RAIL), which integrates reward augmentation into the multi-agent imitation learning framework and allows the designer to specify prior knowledge in a principled fashion. We prove that convergence guarantees for the imitation learning process are preserved under the application of reward augmentation. This method is validated in a driving scenario, where an entire traffic scene is controlled by driving policies learned using our proposed algorithm. Further, we demonstrate improved performance in comparison to traditional imitation learning algorithms both in terms of the local actions of a single agent and the behavior of emergent properties in complex, multi-agent settings. |
Tasks | Imitation Learning |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.05766v1 |
http://arxiv.org/pdf/1903.05766v1.pdf | |
PWC | https://paperswithcode.com/paper/simulating-emergent-properties-of-human |
Repo | https://github.com/sisl/ngsim_env |
Framework | tf |
Personalization and Optimization of Decision Parameters via Heterogenous Causal Effects
Title | Personalization and Optimization of Decision Parameters via Heterogenous Causal Effects |
Authors | Ye Tu, Kinjal Basu, Jinyun Yan, Birjodh Tiwana, Shaunak Chatterjee |
Abstract | Randomized experimentation (also known as A/B testing or bucket testing) is very commonly used in the internet industry to measure the effect of a new treatment. Often, the decision on the basis of such A/B testing is to ramp the treatment variant that did best for the entire population. However, the effect of any given treatment varies across experimental units, and choosing a single variant to ramp to the whole population can be quite suboptimal. In this work, we propose a method which automatically identifies the collection of cohorts exhibiting heterogeneous treatment effect (using causal trees). We then use stochastic optimization to identify the optimal treatment variant in each cohort. We use two real-life examples - one related to serving notifications and the other related to modulating ads density on feed. In both examples, using offline simulation and online experimentation, we demonstrate the benefits of our approach. At the time of writing this paper, the method described has been deployed on the LinkedIn Ads and Notifications system. |
Tasks | Stochastic Optimization |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10550v2 |
http://arxiv.org/pdf/1901.10550v2.pdf | |
PWC | https://paperswithcode.com/paper/personalization-and-optimization-of-decision |
Repo | https://github.com/tuye0305/kdd2019prophet |
Framework | none |