January 29, 2020

2999 words 15 mins read

Paper Group ANR 526

Paper Group ANR 526

MRI to CT Translation with GANs. Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation. Noisier2Noise: Learning to Denoise from Unpaired Noisy Data. VrR-VG: Refocusing Visually-Relevant Relationships. Accelerating Deconvolution on Unmodified CNN Accelerators for Generative Adversarial Networks – A Software Approach. Fully …

MRI to CT Translation with GANs

Title MRI to CT Translation with GANs
Authors Bodo Kaiser, Shadi Albarqouni
Abstract We present a detailed description and reference implementation of preprocessing steps necessary to prepare the public Retrospective Image Registration Evaluation (RIRE) dataset for the task of magnetic resonance imaging (MRI) to X-ray computed tomography (CT) translation. Furthermore we describe and implement three state of the art convolutional neural network (CNN) and generative adversarial network (GAN) models where we report statistics and visual results of two of them.
Tasks Computed Tomography (CT), Image Registration
Published 2019-01-16
URL http://arxiv.org/abs/1901.05259v1
PDF http://arxiv.org/pdf/1901.05259v1.pdf
PWC https://paperswithcode.com/paper/mri-to-ct-translation-with-gans
Repo
Framework

Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation

Title Contextual Out-of-Domain Utterance Handling With Counterfeit Data Augmentation
Authors Sungjin Lee, Igor Shalyminov
Abstract Neural dialog models often lack robustness to anomalous user input and produce inappropriate responses which leads to frustrating user experience. Although there are a set of prior approaches to out-of-domain (OOD) utterance detection, they share a few restrictions: they rely on OOD data or multiple sub-domains, and their OOD detection is context-independent which leads to suboptimal performance in a dialog. The goal of this paper is to propose a novel OOD detection method that does not require OOD data by utilizing counterfeit OOD turns in the context of a dialog. For the sake of fostering further research, we also release new dialog datasets which are 3 publicly available dialog corpora augmented with OOD turns in a controllable way. Our method outperforms state-of-the-art dialog models equipped with a conventional OOD detection mechanism by a large margin in the presence of OOD utterances.
Tasks Data Augmentation
Published 2019-05-24
URL https://arxiv.org/abs/1905.10247v1
PDF https://arxiv.org/pdf/1905.10247v1.pdf
PWC https://paperswithcode.com/paper/contextual-out-of-domain-utterance-handling
Repo
Framework

Noisier2Noise: Learning to Denoise from Unpaired Noisy Data

Title Noisier2Noise: Learning to Denoise from Unpaired Noisy Data
Authors Nick Moran, Dan Schmidt, Yu Zhong, Patrick Coady
Abstract We present a method for training a neural network to perform image denoising without access to clean training examples or access to paired noisy training examples. Our method requires only a single noisy realization of each training example and a statistical model of the noise distribution, and is applicable to a wide variety of noise models, including spatially structured noise. Our model produces results which are competitive with other learned methods which require richer training data, and outperforms traditional non-learned denoising methods. We present derivations of our method for arbitrary additive noise, an improvement specific to Gaussian additive noise, and an extension to multiplicative Bernoulli noise.
Tasks Denoising, Image Denoising
Published 2019-10-25
URL https://arxiv.org/abs/1910.11908v1
PDF https://arxiv.org/pdf/1910.11908v1.pdf
PWC https://paperswithcode.com/paper/noisier2noise-learning-to-denoise-from
Repo
Framework

VrR-VG: Refocusing Visually-Relevant Relationships

Title VrR-VG: Refocusing Visually-Relevant Relationships
Authors Yuanzhi Liang, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei
Abstract Relationships encode the interactions among individual instances, and play a critical role in deep visual scene understanding. Suffering from the high predictability with non-visual information, existing methods tend to fit the statistical bias rather than learning'' to infer’’ the relationships from images. To encourage further development in visual relationships, we propose a novel method to automatically mine more valuable relationships by pruning visually-irrelevant ones. We construct a new scene-graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) based on Visual Genome. Compared with existing datasets, the performance gap between learnable and statistical method is more significant in VrR-VG, and frequency-based analysis does not work anymore. Moreover, we propose to learn a relationship-aware representation by jointly considering instances, attributes and relationships. By applying the representation-aware feature learned on VrR-VG, the performances of image captioning and visual question answering are systematically improved with a large margin, which demonstrates the gain of our dataset and the features embedding schema. VrR-VG is available via http://vrr-vg.com/.
Tasks Image Captioning, Question Answering, Scene Graph Generation, Scene Understanding, Visual Question Answering
Published 2019-02-01
URL https://arxiv.org/abs/1902.00313v2
PDF https://arxiv.org/pdf/1902.00313v2.pdf
PWC https://paperswithcode.com/paper/rethinking-visual-relationships-for-high
Repo
Framework

Accelerating Deconvolution on Unmodified CNN Accelerators for Generative Adversarial Networks – A Software Approach

Title Accelerating Deconvolution on Unmodified CNN Accelerators for Generative Adversarial Networks – A Software Approach
Authors Kaijie Tu
Abstract Generative Adversarial Networks (GANs) are the emerging machine learning technology that can learn to automatically create labeled datasets in application domains such as speech, image, video and texts. A GAN typically includes a generative model that is taught to generate any distribution of data, and a discriminator trained to distinguish the synthetic data from real-world data. Both convolutional and deconvolutional layers are the major sources of performance overhead for GANs and directly impact the efficiency of GAN-based systems. There are many prior works investigating specialized hardware architectures that can accelerate convolution and deconvolution simultaneously, but they entail intensive hardware modifications to the existing deep learning accelerators like Google TPU and Diannao that focus on convolution acceleration. In contrast, this work proposes a novel deconvolution layer implementation with a software approach and enables fast and efficient generative network inference on the legacy deep learning processors. Our proposed method reorganizes the computation of deconvolutional layer and allows the deep learning processor to treat it as the standard convolutional layer after we split the original deconvolutional filters into multiple small filters. The proposed data flow is implemented on representative deep learning processors including the dot-production array and the regular 2D PE array architectures. Compared to prior acceleration schemes, the implemented acceleration scheme achieves 2.41X - 4.34X performance speedup and reduces the energy consumption by 27.7% - 54.5% on a set of realistic benchmarks. In addition, we also applied the deconvolution computing approach to off-the-shelf commodity deep learning processor chips. The performance of GANs on the Google TPU chip and Intel NCS2 exhibits 1.67X - 3.04X speedup on average over prior deconvolution implementations.
Tasks
Published 2019-07-03
URL https://arxiv.org/abs/1907.01773v2
PDF https://arxiv.org/pdf/1907.01773v2.pdf
PWC https://paperswithcode.com/paper/accelerating-deconvolution-on-unmodified-cnn
Repo
Framework

Fully Learnable Group Convolution for Acceleration of Deep Neural Networks

Title Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
Authors Xijun Wang, Meina Kan, Shiguang Shan, Xilin Chen
Abstract Benefitted from its great success on many tasks, deep learning is increasingly used on low-computational-cost devices, e.g. smartphone, embedded devices, etc. To reduce the high computational and memory cost, in this work, we propose a fully learnable group convolution module (FLGC for short) which is quite efficient and can be embedded into any deep neural networks for acceleration. Specifically, our proposed method automatically learns the group structure in the training stage in a fully end-to-end manner, leading to a better structure than the existing pre-defined, two-steps, or iterative strategies. Moreover, our method can be further combined with depthwise separable convolution, resulting in 5 times acceleration than the vanilla Resnet50 on single CPU. An additional advantage is that in our FLGC the number of groups can be set as any value, but not necessarily 2^k as in most existing methods, meaning better tradeoff between accuracy and speed. As evaluated in our experiments, our method achieves better performance than existing learnable group convolution and standard group convolution when using the same number of groups.
Tasks
Published 2019-03-31
URL http://arxiv.org/abs/1904.00346v1
PDF http://arxiv.org/pdf/1904.00346v1.pdf
PWC https://paperswithcode.com/paper/fully-learnable-group-convolution-for
Repo
Framework

Effective 3D Humerus and Scapula Extraction using Low-contrast and High-shape-variability MR Data

Title Effective 3D Humerus and Scapula Extraction using Low-contrast and High-shape-variability MR Data
Authors Xiaoxiao He, Chaowei Tan, Yuting Qiao, Virak Tan, Dimitris Metaxas, Kang Li
Abstract For the initial shoulder preoperative diagnosis, it is essential to obtain a three-dimensional (3D) bone mask from medical images, e.g., magnetic resonance (MR). However, obtaining high-resolution and dense medical scans is both costly and time-consuming. In addition, the imaging parameters for each 3D scan may vary from time to time and thus increase the variance between images. Therefore, it is practical to consider the bone extraction on low-resolution data which may influence imaging contrast and make the segmentation work difficult. In this paper, we present a joint segmentation for the humerus and scapula bones on a small dataset with low-contrast and high-shape-variability 3D MR images. The proposed network has a deep end-to-end architecture to obtain the initial 3D bone masks. Because the existing scarce and inaccurate human-labeled ground truth, we design a self-reinforced learning strategy to increase performance. By comparing with the non-reinforced segmentation and a classical multi-atlas method with joint label fusion, the proposed approach obtains better results.
Tasks
Published 2019-02-22
URL http://arxiv.org/abs/1902.08527v1
PDF http://arxiv.org/pdf/1902.08527v1.pdf
PWC https://paperswithcode.com/paper/effective-3d-humerus-and-scapula-extraction
Repo
Framework

Abstract Solvers for Computing Cautious Consequences of ASP programs

Title Abstract Solvers for Computing Cautious Consequences of ASP programs
Authors Giovanni Amendola, Carmine Dodaro, Marco Maratea
Abstract Abstract solvers are a method to formally analyze algorithms that have been profitably used for describing, comparing and composing solving techniques in various fields such as Propositional Satisfiability (SAT), Quantified SAT, Satisfiability Modulo Theories, Answer Set Programming (ASP), and Constraint ASP. In this paper, we design, implement and test novel abstract solutions for cautious reasoning tasks in ASP. We show how to improve the current abstract solvers for cautious reasoning in ASP with new techniques borrowed from backbone computation in SAT, in order to design new solving algorithms. By doing so, we also formally show that the algorithms for solving cautious reasoning tasks in ASP are strongly related to those for computing backbones of Boolean formulas. We implement some of the new solutions in the ASP solver WASP and show that their performance are comparable to state-of-the-art solutions on the benchmark problems from the past ASP Competitions. Under consideration for acceptance in TPLP.
Tasks
Published 2019-07-22
URL https://arxiv.org/abs/1907.09402v1
PDF https://arxiv.org/pdf/1907.09402v1.pdf
PWC https://paperswithcode.com/paper/abstract-solvers-for-computing-cautious
Repo
Framework

Adversarial Example Detection by Classification for Deep Speech Recognition

Title Adversarial Example Detection by Classification for Deep Speech Recognition
Authors Saeid Samizade, Zheng-Hua Tan, Chao Shen, Xiaohong Guan
Abstract Machine Learning systems are vulnerable to adversarial attacks and will highly likely produce incorrect outputs under these attacks. There are white-box and black-box attacks regarding to adversary’s access level to the victim learning algorithm. To defend the learning systems from these attacks, existing methods in the speech domain focus on modifying input signals and testing the behaviours of speech recognizers. We, however, formulate the defense as a classification problem and present a strategy for systematically generating adversarial example datasets: one for white-box attacks and one for black-box attacks, containing both adversarial and normal examples. The white-box attack is a gradient-based method on Baidu DeepSpeech with the Mozilla Common Voice database while the black-box attack is a gradient-free method on a deep model-based keyword spotting system with the Google Speech Command dataset. The generated datasets are used to train a proposed Convolutional Neural Network (CNN), together with cepstral features, to detect adversarial examples. Experimental results show that, it is possible to accurately distinct between adversarial and normal examples for known attacks, in both single-condition and multi-condition training settings, while the performance degrades dramatically for unknown attacks. The adversarial datasets and the source code are made publicly available.
Tasks Keyword Spotting, Speech Recognition
Published 2019-10-22
URL https://arxiv.org/abs/1910.10013v1
PDF https://arxiv.org/pdf/1910.10013v1.pdf
PWC https://paperswithcode.com/paper/adversarial-example-detection-by
Repo
Framework

Keyword Spotter Model for Crop Pest and Disease Monitoring from Community Radio Data

Title Keyword Spotter Model for Crop Pest and Disease Monitoring from Community Radio Data
Authors Benjamin Akera, Joyce Nakatumba-Nabende, Jonathan Mukiibi, Ali Hussein, Nathan Baleeta, Daniel Ssendiwala, Samiiha Nalwooga
Abstract In societies with well developed internet infrastructure, social media is the leading medium of communication for various social issues especially for breaking news situations. In rural Uganda however, public community radio is still a dominant means for news dissemination. Community radio gives audience to the general public especially to individuals living in rural areas, and thus plays an important role in giving a voice to those living in the broadcast area. It is an avenue for participatory communication and a tool relevant in both economic and social development.This is supported by the rise to ubiquity of mobile phones providing access to phone-in or text-in talk shows. In this paper, we describe an approach to analysing the readily available community radio data with machine learning-based speech keyword spotting techniques. We identify the keywords of interest related to agriculture and build models to automatically identify these keywords from audio streams. Our contribution through these techniques is a cost-efficient and effective way to monitor food security concerns particularly in rural areas. Through keyword spotting and radio talk show analysis, issues such as crop diseases, pests, drought and famine can be captured and fed into an early warning system for stakeholders and policy makers.
Tasks Keyword Spotting
Published 2019-10-05
URL https://arxiv.org/abs/1910.02292v1
PDF https://arxiv.org/pdf/1910.02292v1.pdf
PWC https://paperswithcode.com/paper/keyword-spotter-model-for-crop-pest-and-1
Repo
Framework

A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting

Title A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting
Authors Jiancheng Lyu, Spencer Sheen
Abstract We study channel number reduction in combination with weight binarization (1-bit weight precision) to trim a convolutional neural network for a keyword spotting (classification) task. We adopt a group-wise splitting method based on the group Lasso penalty to achieve over 50% channel sparsity while maintaining the network performance within 0.25% accuracy loss. We show an effective three-stage procedure to balance accuracy and sparsity in network training.
Tasks Keyword Spotting
Published 2019-09-12
URL https://arxiv.org/abs/1909.05623v1
PDF https://arxiv.org/pdf/1909.05623v1.pdf
PWC https://paperswithcode.com/paper/a-channel-pruned-and-weight-binarized
Repo
Framework

A unified representation network for segmentation with missing modalities

Title A unified representation network for segmentation with missing modalities
Authors Kenneth Lau, Jonas Adler, Jens Sjölund
Abstract Over the last few years machine learning has demonstrated groundbreaking results in many areas of medical image analysis, including segmentation. A key assumption, however, is that the train- and test distributions match. We study a realistic scenario where this assumption is clearly violated, namely segmentation with missing input modalities. We describe two neural network approaches that can handle a variable number of input modalities. The first is modality dropout: a simple but surprisingly effective modification of the training. The second is the unified representation network: a network architecture that maps a variable number of input modalities into a unified representation that can be used for downstream tasks such as segmentation. We demonstrate that modality dropout makes a standard segmentation network reasonably robust to missing modalities, but that the same network works even better if trained on the unified representation.
Tasks
Published 2019-08-19
URL https://arxiv.org/abs/1908.06683v1
PDF https://arxiv.org/pdf/1908.06683v1.pdf
PWC https://paperswithcode.com/paper/a-unified-representation-network-for
Repo
Framework

Multi-layer Attention Mechanism for Speech Keyword Recognition

Title Multi-layer Attention Mechanism for Speech Keyword Recognition
Authors Ruisen Luo, Tianran Sun, Chen Wang, Miao Du, Zuodong Tang, Kai Zhou, Xiaofeng Gong, Xiaomei Yang
Abstract As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition are based on long short-term memory (LSTM) networks with attention mechanism. However, due to inevitable information losses for the LSTM layer caused during feature extraction, the calculated attention weights are biased. In this paper, a novel approach, namely Multi-layer Attention Mechanism, is proposed to handle the inaccurate attention weights problem. The key idea is that, in addition to the conventional attention mechanism, information of layers prior to feature extraction and LSTM are introduced into attention weights calculations. Therefore, the attention weights are more accurate because the overall model can have more precise and focused areas. We conduct a comprehensive comparison and analysis on the keyword spotting performances on convolution neural network, bi-directional LSTM cyclic neural network, and cyclic neural network with the proposed attention mechanism on Google Speech Command datasets V2 datasets. Experimental results indicate favorable results for the proposed method and demonstrate the validity of the proposed method. The proposed multi-layer attention methods can be useful for other researches related to object spotting.
Tasks Keyword Spotting, Speech Recognition
Published 2019-07-10
URL https://arxiv.org/abs/1907.04536v1
PDF https://arxiv.org/pdf/1907.04536v1.pdf
PWC https://paperswithcode.com/paper/multi-layer-attention-mechanism-for-speech
Repo
Framework

Short Isometric Shapelet Transform for Binary Time Series Classification

Title Short Isometric Shapelet Transform for Binary Time Series Classification
Authors Weibo Shu, Yaqiang Yao, Huanhuan Chen
Abstract In the research area of time series classification (TSC), ensemble shapelet transform (ST) algorithm is one of state-of-the-art algorithms for classification. However, the time complexity of it is often higher than other algorithms. Hence, two strategies of reducing the high time complexity are proposed in this paper. The first one is to only exploit shapelet candidates whose length is a given small value, whereas the ensemble ST uses shapelet candidates of all the feasible lengths. The second one is to train a single linear classifier in the feature space, whereas the ensemble ST requires an ensemble classifier trained in the feature space. This paper focuses on the theoretical evidences and the empirical implementation of these two strategies. The theoretical part guarantees a near-lossless accuracy under some preconditions while reducing the time complexity. In the empirical part, an algorithm is proposed as a model implementation of these two strategies. The superior performance of the proposed algorithm on some experiments shows the effectiveness of these two strategies.
Tasks Time Series, Time Series Classification
Published 2019-12-27
URL https://arxiv.org/abs/1912.11982v1
PDF https://arxiv.org/pdf/1912.11982v1.pdf
PWC https://paperswithcode.com/paper/short-isometric-shapelet-transform-for-binary
Repo
Framework

SAIS: Single-stage Anchor-free Instance Segmentation

Title SAIS: Single-stage Anchor-free Instance Segmentation
Authors Canqun Xiang, Shishun Tian, Wenbin Zou, Chen Xu
Abstract In this paper, we propose a simple yet efficientinstance segmentation approach based on the single-stage anchor-free detector, termed SAIS. In our approach, the instancesegmentation task consists of two parallel subtasks which re-spectively predict the mask coefficients and the mask prototypes.Then, instance masks are generated by linearly combining theprototypes with the mask coefficients. To enhance the quality ofinstance mask, the information from regression and classificationis fused to predict the mask coefficients. In addition, center-aware target is designed to preserve the center coordination ofeach instance, which achieves a stable improvement in instancesegmentation. The experiment on MS COCO shows that SAISachieves the performance of the exiting state-of-the-art single-stage methods with a much less memory footpr
Tasks Instance Segmentation, Semantic Segmentation
Published 2019-12-03
URL https://arxiv.org/abs/1912.01176v1
PDF https://arxiv.org/pdf/1912.01176v1.pdf
PWC https://paperswithcode.com/paper/sais-single-stage-anchor-free-instance
Repo
Framework
comments powered by Disqus