February 1, 2020

2679 words 13 mins read

Paper Group AWR 184

CEDR: Contextualized Embeddings for Document Ranking. $Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction. DADA: Depth-aware Domain Adaptation in Semantic Segmentation. Cost-sensitive Regularization for Label Confusion-aware Event Detection. Semi-Supervised Learning by Disentangling and Self-Ensembling …

CEDR: Contextualized Embeddings for Document Ranking


Title	CEDR: Contextualized Embeddings for Document Ranking
Authors	Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian
Abstract	Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT’s classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.
Tasks	Ad-Hoc Information Retrieval, Document Ranking
Published	2019-04-15
URL	https://arxiv.org/abs/1904.07094v3
PDF	https://arxiv.org/pdf/1904.07094v3.pdf
PWC	https://paperswithcode.com/paper/190407094
Repo	https://github.com/Crysitna/CEDR_tpu
Framework	pytorch

$Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction


Title	$Σ$-net: Ensembled Iterative Deep Neural Networks for Accelerated Parallel MR Image Reconstruction
Authors	Jo Schlemper, Chen Qin, Jinming Duan, Ronald M. Summers, Kerstin Hammernik
Abstract	We explore an ensembled $\Sigma$-net for fast parallel MR imaging, including parallel coil networks, which perform implicit coil weighting, and sensitivity networks, involving explicit sensitivity maps. The networks in $\Sigma$-net are trained in a supervised way, including content and GAN losses, and with various ways of data consistency, i.e., proximal mappings, gradient descent and variable splitting. A semi-supervised finetuning scheme allows us to adapt to the k-space data at test time, which, however, decreases the quantitative metrics, although generating the visually most textured and sharp images. For this challenge, we focused on robust and high SSIM scores, which we achieved by ensembling all models to a $\Sigma$-net.
Tasks	Image Reconstruction
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05480v1
PDF	https://arxiv.org/pdf/1912.05480v1.pdf
PWC	https://paperswithcode.com/paper/-net-ensembled-iterative-deep-neural-networks
Repo	https://github.com/khammernik/sigmanet
Framework	pytorch

DADA: Depth-aware Domain Adaptation in Semantic Segmentation


Title	DADA: Depth-aware Domain Adaptation in Semantic Segmentation
Authors	Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez
Abstract	Unsupervised domain adaptation (UDA) is important for applications where large scale annotation of representative data is challenging. For semantic segmentation in particular, it helps deploy on real “target domain” data models that are trained on annotated images from a different “source domain”, notably a virtual environment. To this end, most previous works consider semantic segmentation as the only mode of supervision for source domain data, while ignoring other, possibly available, information like depth. In this work, we aim at exploiting at best such a privileged information while training the UDA model. We propose a unified depth-aware UDA framework that leverages in several complementary ways the knowledge of dense depth in the source domain. As a result, the performance of the trained semantic segmentation model on the target domain is boosted. Our novel approach indeed achieves state-of-the-art performance on different challenging synthetic-2-real benchmarks.
Tasks	Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2019-04-03
URL	https://arxiv.org/abs/1904.01886v3
PDF	https://arxiv.org/pdf/1904.01886v3.pdf
PWC	https://paperswithcode.com/paper/dada-depth-aware-domain-adaptation-in
Repo	https://github.com/valeoai/ADVENT
Framework	pytorch

Cost-sensitive Regularization for Label Confusion-aware Event Detection


Title	Cost-sensitive Regularization for Label Confusion-aware Event Detection
Authors	Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
Abstract	In supervised event detection, most of the mislabeling occurs between a small number of confusing type pairs, including trigger-NIL pairs and sibling sub-types of the same coarse type. To address this label confusion problem, this paper proposes cost-sensitive regularization, which can force the training procedure to concentrate more on optimizing confusing type pairs. Specifically, we introduce a cost-weighted term into the training loss, which penalizes more on mislabeling between confusing label pairs. Furthermore, we also propose two estimators which can effectively measure such label confusion based on instance-level or population-level statistics. Experiments on TAC-KBP 2017 datasets demonstrate that the proposed method can significantly improve the performances of different models in both English and Chinese event detection.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06003v1
PDF	https://arxiv.org/pdf/1906.06003v1.pdf
PWC	https://paperswithcode.com/paper/cost-sensitive-regularization-for-label
Repo	https://github.com/sanmusunrise/CSR
Framework	tf

Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space


Title	Semi-Supervised Learning by Disentangling and Self-Ensembling Over Stochastic Latent Space
Authors	Prashnna Kumar Gyawali, Zhiyuan Li, Sandesh Ghimire, Linwei Wang
Abstract	The success of deep learning in medical imaging is mostly achieved at the cost of a large labeled data set. Semi-supervised learning (SSL) provides a promising solution by leveraging the structure of unlabeled data to improve learning from a small set of labeled data. Self-ensembling is a simple approach used in SSL to encourage consensus among ensemble predictions of unknown labels, improving generalization of the model by making it more insensitive to the latent space. Currently, such an ensemble is obtained by randomization such as dropout regularization and random data augmentation. In this work, we hypothesize – from the generalization perspective – that self-ensembling can be improved by exploiting the stochasticity of a disentangled latent space. To this end, we present a stacked SSL model that utilizes unsupervised disentangled representation learning as the stochastic embedding for self-ensembling. We evaluate the presented model for multi-label classification using chest X-ray images, demonstrating its improved performance over related SSL models as well as the interpretability of its disentangled representations.
Tasks	Data Augmentation, Multi-Label Classification, Representation Learning
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09607v1
PDF	https://arxiv.org/pdf/1907.09607v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-learning-by-disentangling-and
Repo	https://github.com/Prasanna1991/StochasticEnsembleSSL
Framework	pytorch

FANDA: A Novel Approach to Perform Follow-up Query Analysis


Title	FANDA: A Novel Approach to Perform Follow-up Query Analysis
Authors	Qian Liu, Bei Chen, Jian-Guang Lou, Ge Jin, Dongmei Zhang
Abstract	Recent work on Natural Language Interfaces to Databases (NLIDB) has attracted considerable attention. NLIDB allow users to search databases using natural language instead of SQL-like query languages. While saving the users from having to learn query languages, multi-turn interaction with NLIDB usually involves multiple queries where contextual information is vital to understand the users’ query intents. In this paper, we address a typical contextual understanding problem, termed as follow-up query analysis. In spite of its ubiquity, follow-up query analysis has not been well studied due to two primary obstacles: the multifarious nature of follow-up query scenarios and the lack of high-quality datasets. Our work summarizes typical follow-up query scenarios and provides a new FollowUp dataset with $1000$ query triples on 120 tables. Moreover, we propose a novel approach FANDA, which takes into account the structures of queries and employs a ranking model with weakly supervised max-margin learning. The experimental results on FollowUp demonstrate the superiority of FANDA over multiple baselines across multiple metrics.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08259v1
PDF	http://arxiv.org/pdf/1901.08259v1.pdf
PWC	https://paperswithcode.com/paper/fanda-a-novel-approach-to-perform-follow-up
Repo	https://github.com/SivilTaram/FollowUp
Framework	none

A Step Toward Quantifying Independently Reproducible Machine Learning Research


Title	A Step Toward Quantifying Independently Reproducible Machine Learning Research
Authors	Edward Raff
Abstract	What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper.
Tasks
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06674v1
PDF	https://arxiv.org/pdf/1909.06674v1.pdf
PWC	https://paperswithcode.com/paper/a-step-toward-quantifying-independently
Repo	https://github.com/EdwardRaff/Quantifying-Independently-Reproducible-ML
Framework	none

How to Fine-Tune BERT for Text Classification?


Title	How to Fine-Tune BERT for Text Classification?
Authors	Chi Sun, Xipeng Qiu, Yige Xu, Xuanjing Huang
Abstract	Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing results in many language understanding tasks. In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning. Finally, the proposed solution obtains new state-of-the-art results on eight widely-studied text classification datasets.
Tasks	Language Modelling, Text Classification
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05583v3
PDF	https://arxiv.org/pdf/1905.05583v3.pdf
PWC	https://paperswithcode.com/paper/how-to-fine-tune-bert-for-text-classification
Repo	https://github.com/arctic-yen/Google_QUEST_Q-A_Labeling
Framework	tf

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction


Title	An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction
Authors	Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A. Laurenzano, Lingjia Tang, Jason Mars
Abstract	Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.
Tasks	Intent Classification, Text Classification
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02027v1
PDF	https://arxiv.org/pdf/1909.02027v1.pdf
PWC	https://paperswithcode.com/paper/an-evaluation-dataset-for-intent
Repo	https://github.com/clinc/oos-eval
Framework	none

LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization


Title	LeafNATS: An Open-Source Toolkit and Live Demo System for Neural Abstractive Text Summarization
Authors	Tian Shi, Ping Wang, Chandan K. Reddy
Abstract	Neural abstractive text summarization (NATS) has received a lot of attention in the past few years from both industry and academia. In this paper, we introduce an open-source toolkit, namely LeafNATS, for training and evaluation of different sequence-to-sequence based models for the NATS task, and for deploying the pre-trained models to real-world applications. The toolkit is modularized and extensible in addition to maintaining competitive performance in the NATS task. A live news blogging system has also been implemented to demonstrate how these models can aid blog/news editors by providing them suggestions of headlines and summaries of their articles.
Tasks	Abstractive Text Summarization, Text Summarization
Published	2019-05-28
URL	https://arxiv.org/abs/1906.01512v1
PDF	https://arxiv.org/pdf/1906.01512v1.pdf
PWC	https://paperswithcode.com/paper/190601512
Repo	https://github.com/tshi04/LeafNATS
Framework	pytorch

Real-Time Lip Sync for Live 2D Animation


Title	Real-Time Lip Sync for Live 2D Animation
Authors	Deepali Aneja, Wilmot Li
Abstract	The emergence of commercial tools for real-time performance-based 2D animation has enabled 2D characters to appear on live broadcasts and streaming platforms. A key requirement for live animation is fast and accurate lip sync that allows characters to respond naturally to other actors or the audience through the voice of a human performer. In this work, we present a deep learning based interactive system that automatically generates live lip sync for layered 2D characters using a Long Short Term Memory (LSTM) model. Our system takes streaming audio as input and produces viseme sequences with less than 200ms of latency (including processing time). Our contributions include specific design decisions for our feature definition and LSTM configuration that provide a small but useful amount of lookahead to produce accurate lip sync. We also describe a data augmentation procedure that allows us to achieve good results with a very small amount of hand-animated training data (13-20 minutes). Extensive human judgement experiments show that our results are preferred over several competing methods, including those that only support offline (non-live) processing. Video summary and supplementary results at GitHub link: https://github.com/deepalianeja/CharacterLipSync2D
Tasks	Data Augmentation
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08685v1
PDF	https://arxiv.org/pdf/1910.08685v1.pdf
PWC	https://paperswithcode.com/paper/real-time-lip-sync-for-live-2d-animation
Repo	https://github.com/deepalianeja/CharacterLipSync2D
Framework	none

NeMo: a toolkit for building AI applications using Neural Modules


Title	NeMo: a toolkit for building AI applications using Neural Modules
Authors	Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
Abstract	NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition. NeMo is built around neural modules, conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations. NeMo makes it easy to combine and re-use these building blocks while providing a level of semantic correctness checking via its neural type system. The toolkit comes with extendable collections of pre-built modules for automatic speech recognition and natural language processing. Furthermore, NeMo provides built-in support for distributed training and mixed precision on latest NVIDIA GPUs. NeMo is open-source https://github.com/NVIDIA/NeMo
Tasks	Speech Recognition
Published	2019-09-14
URL	https://arxiv.org/abs/1909.09577v1
PDF	https://arxiv.org/pdf/1909.09577v1.pdf
PWC	https://paperswithcode.com/paper/nemo-a-toolkit-for-building-ai-applications
Repo	https://github.com/NVIDIA/NeMo
Framework	pytorch

Image-based table recognition: data, model, and evaluation


Title	Image-based table recognition: data, model, and evaluation
Authors	Xu Zhong, Elaheh ShafieiBavani, Antonio Jimeno Yepes
Abstract	Important information that relates to a specific topic in a document is often organized in tabular format to assist readers with information retrieval and comparison, which may be difficult to provide in natural language. However, tabular data in unstructured digital documents, e.g., Portable Document Format (PDF) and images, are difficult to parse into structured machine-readable format, due to complexity and diversity in their structure and style. To facilitate image-based table recognition with deep learning, we develop the largest publicly available table recognition dataset PubTabNet (https://github.com/ibm-aur-nlp/PubTabNet), containing 568k table images with corresponding structured HTML representation. PubTabNet is automatically generated by matching the XML and PDF representations of the scientific articles in PubMed Central Open Access Subset (PMCOA). We also propose a novel attention-based encoder-dual-decoder (EDD) architecture that converts images of tables into HTML code. The model has a structure decoder which reconstructs the table structure and helps the cell decoder to recognize cell content. In addition, we propose a new Tree-Edit-Distance-based Similarity (TEDS) metric for table recognition, which more appropriately captures multi-hop cell misalignment and OCR errors than the pre-established metric. The experiments demonstrate that the EDD model can accurately recognize complex tables solely relying on the image representation, outperforming the state-of-the-art by 9.7% absolute TEDS score.
Tasks	Information Retrieval, Optical Character Recognition
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10683v5
PDF	https://arxiv.org/pdf/1911.10683v5.pdf
PWC	https://paperswithcode.com/paper/image-based-table-recognition-data-model-and
Repo	https://github.com/ibm-aur-nlp/PubTabNet
Framework	none

Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset


Title	Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset
Authors	Akam Qader, Hossein Hassani
Abstract	We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to develop a system that automatically could recognize simple sentences based on the vocabulary which is used in grades one to three of the primary schools in the Kurdistan Region of Iraq. We used CMUSphinx as our experimental environment. We developed a dataset to train the system. The dataset is publicly available for non-commercial use under the CC BY-NC-SA 4.0 license.
Tasks	Speech Recognition
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13087v2
PDF	https://arxiv.org/pdf/1911.13087v2.pdf
PWC	https://paperswithcode.com/paper/kurdish-sorani-speech-to-text-presenting-an
Repo	https://github.com/KurdishBLARK/BD-4SK-ASR
Framework	none

Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure


Title	Enhancing Quality for VVC Compressed Videos by Jointly Exploiting Spatial Details and Temporal Structure
Authors	Xiandong Meng, Xuan Deng, Shuyuan Zhu, Bing Zeng
Abstract	In this paper, we propose a quality enhancement network of versatile video coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The proposed network consists of a temporal structure fusion subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the latter subnet is used to reduce the compression artifacts and enhance the reconstruction quality of compressed video. Experimental results demonstrate the effectiveness of our SDTS-based method.
Tasks	Video Compression
Published	2019-01-28
URL	https://arxiv.org/abs/1901.09575v2
PDF	https://arxiv.org/pdf/1901.09575v2.pdf
PWC	https://paperswithcode.com/paper/enhancing-quality-for-vvc-compressed-videos
Repo	https://github.com/mengab/Versatile-Video-Coding
Framework	tf