January 25, 2020

3243 words 16 mins read

Paper Group ANR 1775

Training Quantized Neural Networks with a Full-precision Auxiliary Module. Style-based Variational Autoencoder for Real-World Super-Resolution. Deep Neural Network for Fast and Accurate Single Image Super-Resolution via Channel-Attention-based Fusion of Orientation-aware Features. A unified sequence-to-sequence front-end model for Mandarin text-to- …

Training Quantized Neural Networks with a Full-precision Auxiliary Module


Title	Training Quantized Neural Networks with a Full-precision Auxiliary Module
Authors	Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, Ian Reid
Abstract	In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function. We propose a solution by training the low-precision network with a fullprecision auxiliary module. Specifically, during training, we construct a mix-precision network by augmenting the original low-precision network with the full precision auxiliary module. Then the augmented mix-precision network and the low-precision network are jointly optimized. This strategy creates additional full-precision routes to update the parameters of the low-precision model, thus making the gradient back-propagates more easily. At the inference time, we discard the auxiliary module without introducing any computational complexity to the low-precision network. We evaluate the proposed method on image classification and object detection over various quantization approaches and show consistent performance increase. In particular, we achieve near lossless performance to the full-precision model by using a 4-bit detector, which is of great practical value.
Tasks	Image Classification, Object Detection, Quantization
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11236v3
PDF	https://arxiv.org/pdf/1903.11236v3.pdf
PWC	https://paperswithcode.com/paper/training-quantized-network-with-auxiliary
Repo
Framework

Style-based Variational Autoencoder for Real-World Super-Resolution


Title	Style-based Variational Autoencoder for Real-World Super-Resolution
Authors	Xin Ma, Yi Li, Huaibo Huang, Mandi Luo, Tanhao Hu, Ran He
Abstract	Real-world image super-resolution is a challenging image translation problem. Low-resolution (LR) images are often generated by various unknown transformations rather than by applying simple bilinear down-sampling on HR images. To address this issue, this paper proposes a novel Style-based Super-Resolution Variational Autoencoder network (SSRVAE) that contains a style Variational Autoencoder (styleVAE) and a SR Network. To get realistic real-world low-quality images paired with the HR images, we design a styleVAE to transfer the complex nuisance factors in real-world LR images to the generated LR images. We also use mutual information estimation (MI) to get better style information. For our SR network, we firstly propose a global attention residual block to learn long-range dependencies in images. Then another local attention residual block is proposed to enforce the attention of SR network moves to local areas of images in which texture detail will be filled. It is worth noticing that styleVAE is presented in a plug-and-play manner and thus can help to promote the generalization and robustness of our SR method as well as other SR methods. Extensive experiments demonstrate that our SSRVAE surpasses the state-of-the-art methods, both quantitatively and qualitatively.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10227v1
PDF	https://arxiv.org/pdf/1912.10227v1.pdf
PWC	https://paperswithcode.com/paper/style-based-variational-autoencoder-for-real
Repo
Framework

Deep Neural Network for Fast and Accurate Single Image Super-Resolution via Channel-Attention-based Fusion of Orientation-aware Features


Title	Deep Neural Network for Fast and Accurate Single Image Super-Resolution via Channel-Attention-based Fusion of Orientation-aware Features
Authors	Du Chen, Zewei He, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, Michael Ying Yang, Siliang Tang, Yueting Zhuang
Abstract	Recently, Convolutional Neural Networks (CNNs) have been successfully adopted to solve the ill-posed single image super-resolution (SISR) problem. A commonly used strategy to boost the performance of CNN-based SISR models is deploying very deep networks, which inevitably incurs many obvious drawbacks (e.g., a large number of network parameters, heavy computational loads, and difficult model training). In this paper, we aim to build more accurate and faster SISR models via developing better-performing feature extraction and fusion techniques. Firstly, we proposed a novel Orientation-Aware feature extraction and fusion Module (OAM), which contains a mixture of 1D and 2D convolutional kernels (i.e., 5 x 1, 1 x 5, and 3 x 3) for extracting orientation-aware features. Secondly, we adopt the channel attention mechanism as an effective technique to adaptively fuse features extracted in different directions and in hierarchically stacked convolutional stages. Based on these two important improvements, we present a compact but powerful CNN-based model for high-quality SISR via Channel Attention-based fusion of Orientation-Aware features (SISR-CA-OA). Extensive experimental results verify the superiority of the proposed SISR-CA-OA model, performing favorably against the state-of-the-art SISR models in terms of both restoration accuracy and computational efficiency. The source codes will be made publicly available.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04016v1
PDF	https://arxiv.org/pdf/1912.04016v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-for-fast-and-accurate
Repo
Framework

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis


Title	A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Authors	Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang
Abstract	In Mandarin text-to-speech (TTS) system, the front-end text processing module significantly influences the intelligibility and naturalness of synthesized speech. Building a typical pipeline-based front-end which consists of multiple individual components requires extensive efforts. In this paper, we proposed a unified sequence-to-sequence front-end model for Mandarin TTS that converts raw texts to linguistic features directly. Compared to the pipeline-based front-end, our unified front-end can achieve comparable performance in polyphone disambiguation and prosody word prediction, and improve intonation phrase prediction by 0.0738 in F1 score. We also implemented the unified front-end with Tacotron and WaveRNN to build a Mandarin TTS system. The synthesized speech by that got a comparable MOS (4.38) with the pipeline-based front-end (4.37) and close to human recordings (4.49).
Tasks	Speech Synthesis, Text-To-Speech Synthesis
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04111v1
PDF	https://arxiv.org/pdf/1911.04111v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-sequence-to-sequence-front-end
Repo
Framework

Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas


Title	Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas
Authors	Shriphani Palakodety, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell
Abstract	The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 600,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos) with an aim to analyze the possible role of AI in helping a marginalized community. Using a novel combination of multiple Active Learning strategies and a novel active sampling strategy based on nearest-neighbors in the comment-embedding space, we construct a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones. We advocate that beyond the burgeoning field of hate-speech detection, automatic detection of \emph{help-speech} can lend voice to the voiceless people and make the internet safer for marginalized communities.
Tasks	Active Learning, Hate Speech Detection
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03206v2
PDF	https://arxiv.org/pdf/1910.03206v2.pdf
PWC	https://paperswithcode.com/paper/voice-for-the-voiceless-active-sampling-to
Repo
Framework

mfEGRA: Multifidelity Efficient Global Reliability Analysis


Title	mfEGRA: Multifidelity Efficient Global Reliability Analysis
Authors	Anirban Chaudhuri, Alexandre N. Marques, Karen E. Willcox
Abstract	This paper develops mfEGRA, a multifidelity active learning method using data-driven adaptively refined surrogates for failure boundary location in reliability analysis. This work addresses the issue of prohibitive cost of reliability analysis using Monte Carlo sampling for expensive-to-evaluate high-fidelity models by using cheaper-to-evaluate approximations of the high-fidelity model. The method builds on the Efficient Global Reliability Analysis (EGRA) method, which is a surrogate-based method that uses adaptive sampling for refining Gaussian process surrogates for failure boundary location using a single fidelity model. Our method introduces a two-stage adaptive sampling criterion that uses a multifidelity Gaussian process surrogate to leverage multiple information sources with different fidelities. The method combines expected feasibility criterion from EGRA with one-step lookahead information gain to refine the surrogate around the failure boundary. The computational savings from mfEGRA depends on the discrepancy between the different models, and the relative cost of evaluating the different models as compared to the high-fidelity model. We show that accurate estimation of reliability using mfEGRA leads to computational savings of $\sim$46% for an analytical multimodal test problem and 24% for an acoustic horn problem, when compared to single fidelity EGRA.
Tasks	Active Learning
Published	2019-10-06
URL	https://arxiv.org/abs/1910.02497v2
PDF	https://arxiv.org/pdf/1910.02497v2.pdf
PWC	https://paperswithcode.com/paper/mfegra-multifidelity-efficient-global
Repo
Framework

Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment


Title	Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
Authors	Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Abstract	Sequence-to-sequence text-to-speech (TTS) is dominated by soft-attention-based methods. Recently, hard-attention-based methods have been proposed to prevent fatal alignment errors, but their sampling method of discrete alignment is poorly investigated. This research investigates various combinations of sampling methods and probability distributions for alignment transition modeling in a hard-alignment-based sequence-to-sequence TTS method called SSNT-TTS. We clarify the common sampling methods of discrete variables including greedy search, beam search, and random sampling from a Bernoulli distribution in a more general way. Furthermore, we introduce the binary Concrete distribution to model discrete variables more properly. The results of a listening test shows that deterministic search is more preferable than stochastic search, and the binary Concrete distribution is robust with stochastic search for natural alignment transition.
Tasks	Speech Synthesis, Text-To-Speech Synthesis
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12383v1
PDF	https://arxiv.org/pdf/1910.12383v1.pdf
PWC	https://paperswithcode.com/paper/effect-of-choice-of-probability-distribution
Repo
Framework

Investigating the Effectiveness of Representations Based on Word-Embeddings in Active Learning for Labelling Text Datasets


Title	Investigating the Effectiveness of Representations Based on Word-Embeddings in Active Learning for Labelling Text Datasets
Authors	Jinghui Lu, Maeve Henchion, Brian Mac Namee
Abstract	Manually labelling large collections of text data is a time-consuming, expensive, and laborious task, but one that is necessary to support machine learning based on text datasets. Active learning has been shown to be an effective way to alleviate some of the effort required in utilising large collections of unlabelled data for machine learning tasks without needing to fully label them. The representation mechanism used to represent text documents when performing active learning, however, has a significant influence on how effective the process will be. While simple vector representations such as bag of words have been shown to be an effective way to represent documents during active learning, the emergence of representation mechanisms based on the word embeddings prevalent in neural network research (e.g. word2vec and transformer-based models like BERT) offer a promising, and as yet not fully explored, alternative. This paper describes a large-scale evaluation of the effectiveness of different text representation mechanisms for active learning across 8 datasets from varied domains. This evaluation shows that using representations based on modern word embeddings—especially BERT—, which have not yet been widely used in active learning, achieves a significant improvement over more commonly used vector-based methods like bag of words.
Tasks	Active Learning, Word Embeddings
Published	2019-10-04
URL	https://arxiv.org/abs/1910.03505v2
PDF	https://arxiv.org/pdf/1910.03505v2.pdf
PWC	https://paperswithcode.com/paper/investigating-the-effectiveness-of-word
Repo
Framework

DeepIris: Iris Recognition Using A Deep Learning Approach


Title	DeepIris: Iris Recognition Using A Deep Learning Approach
Authors	Shervin Minaee, Amirali Abdolrashidi
Abstract	Iris recognition has been an active research area during last few decades, because of its wide applications in security, from airports to homeland security border control. Different features and algorithms have been proposed for iris recognition in the past. In this paper, we propose an end-to-end deep learning framework for iris recognition based on residual convolutional neural network (CNN), which can jointly learn the feature representation and perform recognition. We train our model on a well-known iris recognition dataset using only a few training images from each class, and show promising results and improvements over previous approaches. We also present a visualization technique which is able to detect the important areas in iris images which can mostly impact the recognition results. We believe this framework can be widely used for other biometrics recognition tasks, helping to have a more scalable and accurate systems.
Tasks	Iris Recognition
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09380v1
PDF	https://arxiv.org/pdf/1907.09380v1.pdf
PWC	https://paperswithcode.com/paper/deepiris-iris-recognition-using-a-deep
Repo
Framework

Incremental processing of noisy user utterances in the spoken language understanding task


Title	Incremental processing of noisy user utterances in the spoken language understanding task
Authors	Stefan Constantin, Jan Niehues, Alex Waibel
Abstract	The state-of-the-art neural network architectures make it possible to create spoken language understanding systems with high quality and fast processing time. One major challenge for real-world applications is the high latency of these systems caused by triggered actions with high executions times. If an action can be separated into subactions, the reaction time of the systems can be improved through incremental processing of the user utterance and starting subactions while the utterance is still being uttered. In this work, we present a model-agnostic method to achieve high quality in processing incrementally produced partial utterances. Based on clean and noisy versions of the ATIS dataset, we show how to create datasets with our method to create low-latency natural language understanding components. We get improvements of up to 47.91 absolute percentage points in the metric F1-score.
Tasks	Spoken Language Understanding
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13790v1
PDF	https://arxiv.org/pdf/1909.13790v1.pdf
PWC	https://paperswithcode.com/paper/incremental-processing-of-noisy-user
Repo
Framework

Understanding Semantics from Speech Through Pre-training


Title	Understanding Semantics from Speech Through Pre-training
Authors	Pengwei Wang, Liangchen Wei, Yong Cao, Jinghui Xie, Yuji Cao, Zaiqing Nie
Abstract	End-to-end Spoken Language Understanding (SLU) is proposed to infer the semantic meaning directly from audio features without intermediate text representation. Although the acoustic model component of an end-to-end SLU system can be pre-trained with Automatic Speech Recognition (ASR) targets, the SLU component can only learn semantic features from limited task-specific training data. In this paper, for the first time we propose to do large-scale unsupervised pre-training for the SLU component of an end-to-end SLU system, so that the SLU component may preserve semantic features from massive unlabeled audio data. As the output of the acoustic model component, i.e. phoneme posterior sequences, has much different characteristic from text sequences, we propose a novel pre-training model called BERT-PLM, which stands for Bidirectional Encoder Representations from Transformers through Permutation Language Modeling. BERT-PLM trains the SLU component on unlabeled data through a regression objective equivalent to the partial permutation language modeling objective, while leverages full bi-directional context information with BERT networks. The experiment results show that our approach out-perform the state-of-the-art end-to-end systems with over 12.5% error reduction.
Tasks	Language Modelling, Speech Recognition, Spoken Language Understanding
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10924v1
PDF	https://arxiv.org/pdf/1909.10924v1.pdf
PWC	https://paperswithcode.com/paper/understanding-semantics-from-speech-through
Repo
Framework

Extractive Summarization via Weighted Dissimilarity and Importance Aligned Key Iterative Algorithm


Title	Extractive Summarization via Weighted Dissimilarity and Importance Aligned Key Iterative Algorithm
Authors	Ryohto Sawada
Abstract	We present importance aligned key iterative algorithm for extractive summarization that is faster than conventional algorithms keeping its accuracy. The computational complexity of our algorithm is O($SNlogN$) to summarize original $N$ sentences into final $S$ sentences. Our algorithm maximizes the weighted dissimilarity defined by the product of importance and cosine dissimilarity so that the summary represents the document and at the same time the sentences of the summary are not similar to each other. The weighted dissimilarity is heuristically maximized by iterative greedy search and binary search to the sentences ordered by importance. We finally show a benchmark score based on summarization of customer reviews of products, which highlights the quality of our algorithm comparable to human and existing algorithms. We provide the source code of our algorithm on github https://github.com/qhapaq-49/imakita .
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1906.02126v1
PDF	https://arxiv.org/pdf/1906.02126v1.pdf
PWC	https://paperswithcode.com/paper/190602126
Repo
Framework

Deep Decomposition Learning for Inverse Imaging Problems


Title	Deep Decomposition Learning for Inverse Imaging Problems
Authors	Dongdong Chen, Mike E. Davies
Abstract	Deep learning is emerging as a new paradigm for solving inverse imaging problems. However, the deep learning methods often lack the assurance of traditional physics-based methods due to the lack of physical information considerations in neural network training and deploying. The appropriate supervision and explicit calibration by the information of the physic model can enhance the neural network learning and its practical performance. In this paper, inspired by the geometry that data can be decomposed by two components from the null-space of the forward operator and the range space of its pseudo-inverse, we train neural networks to learn the two components and therefore learn the decomposition, i.e. we explicitly reformulate the neural network layers as learning range-nullspace decomposition functions with reference to the layer inputs, instead of learning unreferenced functions. We show that the decomposition networks not only produce superior results, but also enjoy good interpretability and generalization. We demonstrate the advantages of decomposition learning on different inverse problems including compressive sensing and image super-resolution as examples.
Tasks	Calibration, Compressive Sensing, Image Super-Resolution, Super-Resolution
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11028v1
PDF	https://arxiv.org/pdf/1911.11028v1.pdf
PWC	https://paperswithcode.com/paper/deep-decomposition-learning-for-inverse
Repo
Framework

GCN-LASE: Towards Adequately Incorporating Link Attributes in Graph Convolutional Networks


Title	GCN-LASE: Towards Adequately Incorporating Link Attributes in Graph Convolutional Networks
Authors	Ziyao Li, Liang Zhang, Guojie Song
Abstract	Graph Convolutional Networks (GCNs) have proved to be a most powerful architecture in aggregating local neighborhood information for individual graph nodes. Low-rank proximities and node features are successfully leveraged in existing GCNs, however, attributes that graph links may carry are commonly ignored, as almost all of these models simplify graph links into binary or scalar values describing node connectedness. In our paper instead, links are reverted to hypostatic relationships between entities with descriptional attributes. We propose GCN-LASE (GCN with Link Attributes and Sampling Estimation), a novel GCN model taking both node and link attributes as inputs. To adequately captures the interactions between link and node attributes, their tensor product is used as neighbor features, based on which we define several graph kernels and further develop according architectures for LASE. Besides, to accelerate the training process, the sum of features in entire neighborhoods are estimated through Monte Carlo method, with novel sampling strategies designed for LASE to minimize the estimation variance. Our experiments show that LASE outperforms strong baselines over various graph datasets, and further experiments corroborate the informativeness of link attributes and our model’s ability of adequately leveraging them.
Tasks
Published	2019-02-26
URL	https://arxiv.org/abs/1902.09817v2
PDF	https://arxiv.org/pdf/1902.09817v2.pdf
PWC	https://paperswithcode.com/paper/gcn-lase-towards-adequately-incorporating
Repo
Framework


Title	Fine-grained Attention and Feature-sharing Generative Adversarial Networks for Single Image Super-Resolution
Authors	Yitong Yan, Chuangchuang Liu, Changyou Chen, Xianfang Sun, Longcun Jin, Xiang Zhou
Abstract	The traditional super-resolution methods that aim to minimize the mean square error usually produce the images with over-smoothed and blurry edges, due to the lose of high-frequency details. In this paper, we propose two novel techniques in the generative adversarial networks to produce photo-realistic images for image super-resolution. Firstly, instead of producing a single score to discriminate images between real and fake, we propose a variant, called Fine-grained Attention Generative Adversarial Network for image super-resolution (FASRGAN), to discriminate each pixel between real and fake. FASRGAN adopts a Unet-like network as the discriminator with two outputs: an image score and an image score map. The score map has the same spatial size as the HR/SR images, serving as the fine-grained attention to represent the degree of reconstruction difficulty for each pixel. Secondly, instead of using different networks for the generator and the discriminator in the SR problem, we use a feature-sharing network (Fs-SRGAN) for both the generator and the discriminator. By network sharing, certain information is shared between the generator and the discriminator, which in turn can improve the ability of producing high-quality images. Quantitative and visual comparisons with the state-of-the-art methods on the benchmark datasets demonstrate the superiority of our methods. The application of super-resolution images to object recognition further proves that the proposed methods endow the power to reconstruction capabilities and the excellent super-resolution effects.
Tasks	Image Super-Resolution, Object Recognition, Super-Resolution
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10773v1
PDF	https://arxiv.org/pdf/1911.10773v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-attention-and-feature-sharing
Repo
Framework