February 1, 2020

2679 words 13 mins read

Paper Group AWR 184

Paper Group AWR 184

CEDR: Contextualized Embeddings for Document Ranking. $Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction. DADA: Depth-aware Domain Adaptation in Semantic Segmentation. Cost-sensitive Regularization for Label Confusion-aware Event Detection. Semi-Supervised Learning by Disentangling and Self-Ensembling …

CEDR: Contextualized Embeddings for Document Ranking

Title CEDR: Contextualized Embeddings for Document Ranking
Authors Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian
Abstract Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT’s classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.
Tasks Ad-Hoc Information Retrieval, Document Ranking
Published 2019-04-15
URL https://arxiv.org/abs/1904.07094v3
PDF https://arxiv.org/pdf/1904.07094v3.pdf
PWC https://paperswithcode.com/paper/190407094
Repo https://github.com/Crysitna/CEDR_tpu
Framework pytorch

$Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction

Title $Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction
Authors Jo Schlemper, Chen Qin, Jinming Duan, Ronald M. Summers, Kerstin Hammernik
Abstract We explore an ensembled $\Sigma$-net for fast parallel MR imaging, including parallel coil networks, which perform implicit coil weighting, and sensitivity networks, involving explicit sensitivity maps. The networks in $\Sigma$-net are trained in a supervised way, including content and GAN losses, and with various ways of data consistency, i.e., proximal mappings, gradient descent and variable splitting. A semi-supervised finetuning scheme allows us to adapt to the k-space data at test time, which, however, decreases the quantitative metrics, although generating the visually most textured and sharp images. For this challenge, we focused on robust and high SSIM scores, which we achieved by ensembling all models to a $\Sigma$-net.
Tasks Image Reconstruction
Published 2019-12-11
URL https://arxiv.org/abs/1912.05480v1
PDF https://arxiv.org/pdf/1912.05480v1.pdf
PWC https://paperswithcode.com/paper/-net-ensembled-iterative-deep-neural-networks
Repo https://github.com/khammernik/sigmanet
Framework pytorch

DADA: Depth-aware Domain Adaptation in Semantic Segmentation

Title DADA: Depth-aware Domain Adaptation in Semantic Segmentation
Authors Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez
Abstract Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real “target domain” data models that are trained on annotated images from a different “source domain”, notably a virtual environment. To this end, most previous works consider semantic segmentation as the only mode of supervision for source domain data, while ignoring other, possibly available, information like depth. In this work, we aim at exploiting at best such a privileged information while training the UDA model. We propose a unified depth-aware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain. As a result, the performance of the trained semantic segmentation model on the target domain is boosted. Our novel approach indeed achieves state-of-the-art performance on different challenging synthetic-2-real benchmarks.
Tasks Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2019-04-03
URL https://arxiv.org/abs/1904.01886v3
PDF https://arxiv.org/pdf/1904.01886v3.pdf
PWC https://paperswithcode.com/paper/dada-depth-aware-domain-adaptation-in
Repo https://github.com/valeoai/ADVENT
Framework pytorch

Cost-sensitive Regularization for Label Confusion-aware Event Detection

Title Cost-sensitive Regularization for Label Confusion-aware Event Detection
Authors Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
Abstract In supervised event detection, most of the mislabeling occurs between a small number of confusing type pairs, including trigger-NIL pairs and sibling sub-types of the same coarse type. To address this label confusion problem, this paper proposes cost-sensitive regularization, which can force the training procedure to concentrate more on optimizing confusing type pairs. Specifically, we introduce a cost-weighted term into the training loss, which penalizes more on mislabeling between confusing label pairs. Furthermore, we also propose two estimators which can effectively measure such label confusion based on instance-level or population-level statistics. Experiments on TAC-KBP 2017 datasets demonstrate that the proposed method can significantly improve the performances of different models in both English and Chinese event detection.
Tasks
Published 2019-06-14
URL https://arxiv.org/abs/1906.06003v1
PDF https://arxiv.org/pdf/1906.06003v1.pdf
PWC https://paperswithcode.com/paper/cost-sensitive-regularization-for-label
Repo https://github.com/sanmusunrise/CSR
Framework tf

Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space

Title Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space
Authors Prashnna Kumar Gyawali, Zhiyuan Li, Sandesh Ghimire, Linwei Wang
Abstract The success of deep learning in medical imaging is mostly achieved at the cost of a large labeled data set. Semi-supervised learning (SSL) provides a promising solution by leveraging the structure of unlabeled data to improve learning from a small set of labeled data. Self-ensembling is a simple approach used in SSL to encourage consensus among ensemble predictions of unknown labels, improving generalization of the model by making it more insensitive to the latent space. Currently, such an ensemble is obtained by randomization such as dropout regularization and random data augmentation. In this work, we hypothesize – from the generalization perspective – that self-ensembling can be improved by exploiting the stochasticity of a disentangled latent space. To this end, we present a stacked SSL model that utilizes unsupervised disentangled representation learning as the stochastic embedding for self-ensembling. We evaluate the presented model for multi-label classification using chest X-ray images, demonstrating its improved performance over related SSL models as well as the interpretability of its disentangled representations.
Tasks Data Augmentation, Multi-Label Classification, Representation Learning
Published 2019-07-22
URL https://arxiv.org/abs/1907.09607v1
PDF https://arxiv.org/pdf/1907.09607v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-learning-by-disentangling-and
Repo https://github.com/Prasanna1991/StochasticEnsembleSSL
Framework pytorch

FANDA: A Novel Approach to Perform Follow-up Query Analysis

Title FANDA: A Novel Approach to Perform Follow-up Query Analysis
Authors Qian Liu, Bei Chen, Jian-Guang Lou, Ge Jin, Dongmei Zhang
Abstract Recent work on Natural Language Interfaces to Databases (NLIDB) has attracted considerable attention. NLIDB allow users to search databases using natural language instead of SQL-like query languages. While saving the users from having to learn query languages, multi-turn interaction with NLIDB usually involves multiple queries where contextual information is vital to understand the users’ query intents. In this paper, we address a typical contextual understanding problem, termed as follow-up query analysis. In spite of its ubiquity, follow-up query analysis has not been well studied due to two primary obstacles: the multifarious nature of follow-up query scenarios and the lack of high-quality datasets. Our work summarizes typical follow-up query scenarios and provides a new FollowUp dataset with $1000$ query triples on 120 tables. Moreover, we propose a novel approach FANDA, which takes into account the structures of queries and employs a ranking model with weakly supervised max-margin learning. The experimental results on FollowUp demonstrate the superiority of FANDA over multiple baselines across multiple metrics.
Tasks
Published 2019-01-24
URL http://arxiv.org/abs/1901.08259v1
PDF http://arxiv.org/pdf/1901.08259v1.pdf
PWC https://paperswithcode.com/paper/fanda-a-novel-approach-to-perform-follow-up
Repo https://github.com/SivilTaram/FollowUp
Framework none

A Step Toward Quantifying Independently Reproducible Machine Learning Research

Title A Step Toward Quantifying Independently Reproducible Machine Learning Research
Authors Edward Raff
Abstract What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper.
Tasks
Published 2019-09-14
URL https://arxiv.org/abs/1909.06674v1
PDF https://arxiv.org/pdf/1909.06674v1.pdf
PWC https://paperswithcode.com/paper/a-step-toward-quantifying-independently
Repo https://github.com/EdwardRaff/Quantifying-Independently-Reproducible-ML
Framework none

How to Fine-Tune BERT for Text Classification?

Title How to Fine-Tune BERT for Text Classification?
Authors Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang
Abstract Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
Tasks Language Modelling, Text Classification
Published 2019-05-14
URL https://arxiv.org/abs/1905.05583v3
PDF https://arxiv.org/pdf/1905.05583v3.pdf
PWC https://paperswithcode.com/paper/how-to-fine-tune-bert-for-text-classification
Repo https://github.com/arctic-yen/Google_QUEST_Q-A_Labeling
Framework tf

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

Title An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
Authors Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason Mars
Abstract Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.
Tasks Intent Classification, Text Classification
Published 2019-09-04
URL https://arxiv.org/abs/1909.02027v1
PDF https://arxiv.org/pdf/1909.02027v1.pdf
PWC https://paperswithcode.com/paper/an-evaluation-dataset-for-intent
Repo https://github.com/clinc/oos-eval
Framework none

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization

Title LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization
Authors Tian Shi, Ping Wang, Chandan K. Reddy
Abstract Neural abstractive text summarization (NATS) has received a lot of attention in the past few years from both industry and academia. In this paper, we introduce an open-source toolkit, namely LeafNATS, for training and evaluation of different sequence-to-sequence based models for the NATS task, and for deploying the pre-trained models to real-world applications. The toolkit is modularized and extensible in addition to maintaining competitive performance in the NATS task. A live news blogging system has also been implemented to demonstrate how these models can aid blog/news editors by providing them suggestions of headlines and summaries of their articles.
Tasks Abstractive Text Summarization, Text Summarization
Published 2019-05-28
URL https://arxiv.org/abs/1906.01512v1
PDF https://arxiv.org/pdf/1906.01512v1.pdf
PWC https://paperswithcode.com/paper/190601512
Repo https://github.com/tshi04/LeafNATS
Framework pytorch

Real-Time Lip Sync for Live 2D Animation

Title Real-Time Lip Sync for Live 2D Animation
Authors Deepali Aneja, Wilmot Li
Abstract The emergence of commercial tools for real-time performance-based 2D animation has enabled 2D characters to appear on live broadcasts and streaming platforms. A key requirement for live animation is fast and accurate lip sync that allows characters to respond naturally to other actors or the audience through the voice of a human performer. In this work, we present a deep learning based interactive system that automatically generates live lip sync for layered 2D characters using a Long Short Term Memory (LSTM) model. Our system takes streaming audio as input and produces viseme sequences with less than 200ms of latency (including processing time). Our contributions include specific design decisions for our feature definition and LSTM configuration that provide a small but useful amount of lookahead to produce accurate lip sync. We also describe a data augmentation procedure that allows us to achieve good results with a very small amount of hand-animated training data (13-20 minutes). Extensive human judgement experiments show that our results are preferred over several competing methods, including those that only support offline (non-live) processing. Video summary and supplementary results at GitHub link: https://github.com/deepalianeja/CharacterLipSync2D
Tasks Data Augmentation
Published 2019-10-19
URL https://arxiv.org/abs/1910.08685v1
PDF https://arxiv.org/pdf/1910.08685v1.pdf
PWC https://paperswithcode.com/paper/real-time-lip-sync-for-live-2d-animation
Repo https://github.com/deepalianeja/CharacterLipSync2D
Framework none

NeMo: a toolkit for building AI applications using Neural Modules

Title NeMo: a toolkit for building AI applications using Neural Modules
Authors Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
Abstract NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition. NeMo is built around neural modules, conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations. NeMo makes it easy to combine and re-use these building blocks while providing a level of semantic correctness checking via its neural type system. The toolkit comes with extendable collections of pre-built modules for automatic speech recognition and natural language processing. Furthermore, NeMo provides built-in support for distributed training and mixed precision on latest NVIDIA GPUs. NeMo is open-source https://github.com/NVIDIA/NeMo
Tasks Speech Recognition
Published 2019-09-14
URL https://arxiv.org/abs/1909.09577v1
PDF https://arxiv.org/pdf/1909.09577v1.pdf
PWC https://paperswithcode.com/paper/nemo-a-toolkit-for-building-ai-applications
Repo https://github.com/NVIDIA/NeMo
Framework pytorch

Image-based table recognition: data, model, and evaluation

Title Image-based table recognition: data, model, and evaluation
Authors Xu Zhong, Elaheh ShafieiBavani, Antonio Jimeno Yepes
Abstract Important information that relates to a specific topic in a document is often organized in tabular format to assist readers with information retrieval and comparison, which may be difficult to provide in natural language. However, tabular data in unstructured digital documents, e.g., Portable Document Format (PDF) and images, are difficult to parse into structured machine-readable format, due to complexity and diversity in their structure and style. To facilitate image-based table recognition with deep learning, we develop the largest publicly available table recognition dataset PubTabNet (https://github.com/ibm-aur-nlp/PubTabNet), containing 568k table images with corresponding structured HTML representation. PubTabNet is automatically generated by matching the XML and PDF representations of the scientific articles in PubMed Central Open Access Subset (PMCOA). We also propose a novel attention-based encoder-dual-decoder (EDD) architecture that converts images of tables into HTML code. The model has a structure decoder which reconstructs the table structure and helps the cell decoder to recognize cell content. In addition, we propose a new Tree-Edit-Distance-based Similarity (TEDS) metric for table recognition, which more appropriately captures multi-hop cell misalignment and OCR errors than the pre-established metric. The experiments demonstrate that the EDD model can accurately recognize complex tables solely relying on the image representation, outperforming the state-of-the-art by 9.7% absolute TEDS score.
Tasks Information Retrieval, Optical Character Recognition
Published 2019-11-25
URL https://arxiv.org/abs/1911.10683v5
PDF https://arxiv.org/pdf/1911.10683v5.pdf
PWC https://paperswithcode.com/paper/image-based-table-recognition-data-model-and
Repo https://github.com/ibm-aur-nlp/PubTabNet
Framework none

Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset

Title Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset
Authors Akam Qader, Hossein Hassani
Abstract We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is used in grades one to three of the primary schools in the Kurdistan Region of Iraq. We used CMUSphinx as our experimental environment. We developed a dataset to train the system. The dataset is publicly available for non-commercial use under the CC BY-NC-SA 4.0 license.
Tasks Speech Recognition
Published 2019-11-29
URL https://arxiv.org/abs/1911.13087v2
PDF https://arxiv.org/pdf/1911.13087v2.pdf
PWC https://paperswithcode.com/paper/kurdish-sorani-speech-to-text-presenting-an
Repo https://github.com/KurdishBLARK/BD-4SK-ASR
Framework none

Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure

Title Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure
Authors Xiandong Meng, Xuan Deng, Shuyuan Zhu, Bing Zeng
Abstract In this paper, we propose a quality enhancement network of versatile video coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The proposed network consists of a temporal structure fusion subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the latter subnet is used to reduce the compression artifacts and enhance the reconstruction quality of compressed video. Experimental results demonstrate the effectiveness of our SDTS-based method.
Tasks Video Compression
Published 2019-01-28
URL https://arxiv.org/abs/1901.09575v2
PDF https://arxiv.org/pdf/1901.09575v2.pdf
PWC https://paperswithcode.com/paper/enhancing-quality-for-vvc-compressed-videos
Repo https://github.com/mengab/Versatile-Video-Coding
Framework tf
comments powered by Disqus