Paper Group AWR 184
CEDR: Contextualized Embeddings for Document Ranking. $Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction. DADA: Depth-aware Domain Adaptation in Semantic Segmentation. Cost-sensitive Regularization for Label Confusion-aware Event Detection. Semi-Supervised Learning by Disentangling and Self-Ensembling …
CEDR: Contextualized Embeddings for Document Ranking
Title | CEDR: Contextualized Embeddings for Document Ranking |
Authors | Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian |
Abstract | Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT’s classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models. |
Tasks | Ad-Hoc Information Retrieval, Document Ranking |
Published | 2019-04-15 |
URL | https://arxiv.org/abs/1904.07094v3 |
https://arxiv.org/pdf/1904.07094v3.pdf | |
PWC | https://paperswithcode.com/paper/190407094 |
Repo | https://github.com/Crysitna/CEDR_tpu |
Framework | pytorch |
$Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction
Title | $Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction |
Authors | Jo Schlemper, Chen Qin, Jinming Duan, Ronald M. Summers, Kerstin Hammernik |
Abstract | We explore an ensembled $\Sigma$-net for fast parallel MR imaging, including parallel coil networks, which perform implicit coil weighting, and sensitivity networks, involving explicit sensitivity maps. The networks in $\Sigma$-net are trained in a supervised way, including content and GAN losses, and with various ways of data consistency, i.e., proximal mappings, gradient descent and variable splitting. A semi-supervised finetuning scheme allows us to adapt to the k-space data at test time, which, however, decreases the quantitative metrics, although generating the visually most textured and sharp images. For this challenge, we focused on robust and high SSIM scores, which we achieved by ensembling all models to a $\Sigma$-net. |
Tasks | Image Reconstruction |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05480v1 |
https://arxiv.org/pdf/1912.05480v1.pdf | |
PWC | https://paperswithcode.com/paper/-net-ensembled-iterative-deep-neural-networks |
Repo | https://github.com/khammernik/sigmanet |
Framework | pytorch |
DADA: Depth-aware Domain Adaptation in Semantic Segmentation
Title | DADA: Depth-aware Domain Adaptation in Semantic Segmentation |
Authors | Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez |
Abstract | Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real “target domain” data models that are trained on annotated images from a different “source domain”, notably a virtual environment. To this end, most previous works consider semantic segmentation as the only mode of supervision for source domain data, while ignoring other, possibly available, information like depth. In this work, we aim at exploiting at best such a privileged information while training the UDA model. We propose a unified depth-aware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain. As a result, the performance of the trained semantic segmentation model on the target domain is boosted. Our novel approach indeed achieves state-of-the-art performance on different challenging synthetic-2-real benchmarks. |
Tasks | Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.01886v3 |
https://arxiv.org/pdf/1904.01886v3.pdf | |
PWC | https://paperswithcode.com/paper/dada-depth-aware-domain-adaptation-in |
Repo | https://github.com/valeoai/ADVENT |
Framework | pytorch |
Cost-sensitive Regularization for Label Confusion-aware Event Detection
Title | Cost-sensitive Regularization for Label Confusion-aware Event Detection |
Authors | Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun |
Abstract | In supervised event detection, most of the mislabeling occurs between a small number of confusing type pairs, including trigger-NIL pairs and sibling sub-types of the same coarse type. To address this label confusion problem, this paper proposes cost-sensitive regularization, which can force the training procedure to concentrate more on optimizing confusing type pairs. Specifically, we introduce a cost-weighted term into the training loss, which penalizes more on mislabeling between confusing label pairs. Furthermore, we also propose two estimators which can effectively measure such label confusion based on instance-level or population-level statistics. Experiments on TAC-KBP 2017 datasets demonstrate that the proposed method can significantly improve the performances of different models in both English and Chinese event detection. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06003v1 |
https://arxiv.org/pdf/1906.06003v1.pdf | |
PWC | https://paperswithcode.com/paper/cost-sensitive-regularization-for-label |
Repo | https://github.com/sanmusunrise/CSR |
Framework | tf |
Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space
Title | Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space |
Authors | Prashnna Kumar Gyawali, Zhiyuan Li, Sandesh Ghimire, Linwei Wang |
Abstract | The success of deep learning in medical imaging is mostly achieved at the cost of a large labeled data set. Semi-supervised learning (SSL) provides a promising solution by leveraging the structure of unlabeled data to improve learning from a small set of labeled data. Self-ensembling is a simple approach used in SSL to encourage consensus among ensemble predictions of unknown labels, improving generalization of the model by making it more insensitive to the latent space. Currently, such an ensemble is obtained by randomization such as dropout regularization and random data augmentation. In this work, we hypothesize – from the generalization perspective – that self-ensembling can be improved by exploiting the stochasticity of a disentangled latent space. To this end, we present a stacked SSL model that utilizes unsupervised disentangled representation learning as the stochastic embedding for self-ensembling. We evaluate the presented model for multi-label classification using chest X-ray images, demonstrating its improved performance over related SSL models as well as the interpretability of its disentangled representations. |
Tasks | Data Augmentation, Multi-Label Classification, Representation Learning |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09607v1 |
https://arxiv.org/pdf/1907.09607v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-by-disentangling-and |
Repo | https://github.com/Prasanna1991/StochasticEnsembleSSL |
Framework | pytorch |
FANDA: A Novel Approach to Perform Follow-up Query Analysis
Title | FANDA: A Novel Approach to Perform Follow-up Query Analysis |
Authors | Qian Liu, Bei Chen, Jian-Guang Lou, Ge Jin, Dongmei Zhang |
Abstract | Recent work on Natural Language Interfaces to Databases (NLIDB) has attracted considerable attention. NLIDB allow users to search databases using natural language instead of SQL-like query languages. While saving the users from having to learn query languages, multi-turn interaction with NLIDB usually involves multiple queries where contextual information is vital to understand the users’ query intents. In this paper, we address a typical contextual understanding problem, termed as follow-up query analysis. In spite of its ubiquity, follow-up query analysis has not been well studied due to two primary obstacles: the multifarious nature of follow-up query scenarios and the lack of high-quality datasets. Our work summarizes typical follow-up query scenarios and provides a new FollowUp dataset with $1000$ query triples on 120 tables. Moreover, we propose a novel approach FANDA, which takes into account the structures of queries and employs a ranking model with weakly supervised max-margin learning. The experimental results on FollowUp demonstrate the superiority of FANDA over multiple baselines across multiple metrics. |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08259v1 |
http://arxiv.org/pdf/1901.08259v1.pdf | |
PWC | https://paperswithcode.com/paper/fanda-a-novel-approach-to-perform-follow-up |
Repo | https://github.com/SivilTaram/FollowUp |
Framework | none |
A Step Toward Quantifying Independently Reproducible Machine Learning Research
Title | A Step Toward Quantifying Independently Reproducible Machine Learning Research |
Authors | Edward Raff |
Abstract | What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper. |
Tasks | |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06674v1 |
https://arxiv.org/pdf/1909.06674v1.pdf | |
PWC | https://paperswithcode.com/paper/a-step-toward-quantifying-independently |
Repo | https://github.com/EdwardRaff/Quantifying-Independently-Reproducible-ML |
Framework | none |
How to Fine-Tune BERT for Text Classification?
Title | How to Fine-Tune BERT for Text Classification? |
Authors | Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang |
Abstract | Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets. |
Tasks | Language Modelling, Text Classification |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05583v3 |
https://arxiv.org/pdf/1905.05583v3.pdf | |
PWC | https://paperswithcode.com/paper/how-to-fine-tune-bert-for-text-classification |
Repo | https://github.com/arctic-yen/Google_QUEST_Q-A_Labeling |
Framework | tf |
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
Title | An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction |
Authors | Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason Mars |
Abstract | Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems. |
Tasks | Intent Classification, Text Classification |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.02027v1 |
https://arxiv.org/pdf/1909.02027v1.pdf | |
PWC | https://paperswithcode.com/paper/an-evaluation-dataset-for-intent |
Repo | https://github.com/clinc/oos-eval |
Framework | none |
LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization
Title | LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization |
Authors | Tian Shi, Ping Wang, Chandan K. Reddy |
Abstract | Neural abstractive text summarization (NATS) has received a lot of attention in the past few years from both industry and academia. In this paper, we introduce an open-source toolkit, namely LeafNATS, for training and evaluation of different sequence-to-sequence based models for the NATS task, and for deploying the pre-trained models to real-world applications. The toolkit is modularized and extensible in addition to maintaining competitive performance in the NATS task. A live news blogging system has also been implemented to demonstrate how these models can aid blog/news editors by providing them suggestions of headlines and summaries of their articles. |
Tasks | Abstractive Text Summarization, Text Summarization |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1906.01512v1 |
https://arxiv.org/pdf/1906.01512v1.pdf | |
PWC | https://paperswithcode.com/paper/190601512 |
Repo | https://github.com/tshi04/LeafNATS |
Framework | pytorch |
Real-Time Lip Sync for Live 2D Animation
Title | Real-Time Lip Sync for Live 2D Animation |
Authors | Deepali Aneja, Wilmot Li |
Abstract | The emergence of commercial tools for real-time performance-based 2D animation has enabled 2D characters to appear on live broadcasts and streaming platforms. A key requirement for live animation is fast and accurate lip sync that allows characters to respond naturally to other actors or the audience through the voice of a human performer. In this work, we present a deep learning based interactive system that automatically generates live lip sync for layered 2D characters using a Long Short Term Memory (LSTM) model. Our system takes streaming audio as input and produces viseme sequences with less than 200ms of latency (including processing time). Our contributions include specific design decisions for our feature definition and LSTM configuration that provide a small but useful amount of lookahead to produce accurate lip sync. We also describe a data augmentation procedure that allows us to achieve good results with a very small amount of hand-animated training data (13-20 minutes). Extensive human judgement experiments show that our results are preferred over several competing methods, including those that only support offline (non-live) processing. Video summary and supplementary results at GitHub link: https://github.com/deepalianeja/CharacterLipSync2D |
Tasks | Data Augmentation |
Published | 2019-10-19 |
URL | https://arxiv.org/abs/1910.08685v1 |
https://arxiv.org/pdf/1910.08685v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-lip-sync-for-live-2d-animation |
Repo | https://github.com/deepalianeja/CharacterLipSync2D |
Framework | none |
NeMo: a toolkit for building AI applications using Neural Modules
Title | NeMo: a toolkit for building AI applications using Neural Modules |
Authors | Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen |
Abstract | NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition. NeMo is built around neural modules, conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations. NeMo makes it easy to combine and re-use these building blocks while providing a level of semantic correctness checking via its neural type system. The toolkit comes with extendable collections of pre-built modules for automatic speech recognition and natural language processing. Furthermore, NeMo provides built-in support for distributed training and mixed precision on latest NVIDIA GPUs. NeMo is open-source https://github.com/NVIDIA/NeMo |
Tasks | Speech Recognition |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.09577v1 |
https://arxiv.org/pdf/1909.09577v1.pdf | |
PWC | https://paperswithcode.com/paper/nemo-a-toolkit-for-building-ai-applications |
Repo | https://github.com/NVIDIA/NeMo |
Framework | pytorch |
Image-based table recognition: data, model, and evaluation
Title | Image-based table recognition: data, model, and evaluation |
Authors | Xu Zhong, Elaheh ShafieiBavani, Antonio Jimeno Yepes |
Abstract | Important information that relates to a specific topic in a document is often organized in tabular format to assist readers with information retrieval and comparison, which may be difficult to provide in natural language. However, tabular data in unstructured digital documents, e.g., Portable Document Format (PDF) and images, are difficult to parse into structured machine-readable format, due to complexity and diversity in their structure and style. To facilitate image-based table recognition with deep learning, we develop the largest publicly available table recognition dataset PubTabNet (https://github.com/ibm-aur-nlp/PubTabNet), containing 568k table images with corresponding structured HTML representation. PubTabNet is automatically generated by matching the XML and PDF representations of the scientific articles in PubMed Central Open Access Subset (PMCOA). We also propose a novel attention-based encoder-dual-decoder (EDD) architecture that converts images of tables into HTML code. The model has a structure decoder which reconstructs the table structure and helps the cell decoder to recognize cell content. In addition, we propose a new Tree-Edit-Distance-based Similarity (TEDS) metric for table recognition, which more appropriately captures multi-hop cell misalignment and OCR errors than the pre-established metric. The experiments demonstrate that the EDD model can accurately recognize complex tables solely relying on the image representation, outperforming the state-of-the-art by 9.7% absolute TEDS score. |
Tasks | Information Retrieval, Optical Character Recognition |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10683v5 |
https://arxiv.org/pdf/1911.10683v5.pdf | |
PWC | https://paperswithcode.com/paper/image-based-table-recognition-data-model-and |
Repo | https://github.com/ibm-aur-nlp/PubTabNet |
Framework | none |
Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset
Title | Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset |
Authors | Akam Qader, Hossein Hassani |
Abstract | We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is used in grades one to three of the primary schools in the Kurdistan Region of Iraq. We used CMUSphinx as our experimental environment. We developed a dataset to train the system. The dataset is publicly available for non-commercial use under the CC BY-NC-SA 4.0 license. |
Tasks | Speech Recognition |
Published | 2019-11-29 |
URL | https://arxiv.org/abs/1911.13087v2 |
https://arxiv.org/pdf/1911.13087v2.pdf | |
PWC | https://paperswithcode.com/paper/kurdish-sorani-speech-to-text-presenting-an |
Repo | https://github.com/KurdishBLARK/BD-4SK-ASR |
Framework | none |
Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure
Title | Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure |
Authors | Xiandong Meng, Xuan Deng, Shuyuan Zhu, Bing Zeng |
Abstract | In this paper, we propose a quality enhancement network of versatile video coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The proposed network consists of a temporal structure fusion subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the latter subnet is used to reduce the compression artifacts and enhance the reconstruction quality of compressed video. Experimental results demonstrate the effectiveness of our SDTS-based method. |
Tasks | Video Compression |
Published | 2019-01-28 |
URL | https://arxiv.org/abs/1901.09575v2 |
https://arxiv.org/pdf/1901.09575v2.pdf | |
PWC | https://paperswithcode.com/paper/enhancing-quality-for-vvc-compressed-videos |
Repo | https://github.com/mengab/Versatile-Video-Coding |
Framework | tf |