January 28, 2020

2839 words 14 mins read

Paper Group ANR 1045

Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans. Depth-Map Generation using Pixel Matching in Stereoscopic Pair of Images. G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR. Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition. OpenEI: An Open Framework …

Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans


Title	Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans
Authors	Michał Marcinkiewicz, Grzegorz Mrukwa
Abstract	Over the last few years, deep learning has proven to be a great solution to many problems, such as image or text classification. Recently, deep learning-based solutions have outperformed humans on selected benchmark datasets, yielding a promising future for scientific and real-world applications. Training of deep learning models requires vast amounts of high quality data to achieve such supreme performance. In real-world scenarios, obtaining a large, coherent, and properly labeled dataset is a challenging task. This is especially true in medical applications, where high-quality data and annotations are scarce and the number of expert annotators is limited. In this paper, we investigate the impact of corrupted ground-truth masks on the performance of a neural network for a brain tumor segmentation task. Our findings suggest that a) the performance degrades about 8% less than it could be expected from simulations, b) a neural network learns the simulated biases of annotators, c) biases can be partially mitigated by using an inversely-biased dice loss function.
Tasks	Brain Tumor Segmentation, Text Classification
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08959v1
PDF	https://arxiv.org/pdf/1909.08959v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-impact-of-label-noise-on-the
Repo
Framework

Depth-Map Generation using Pixel Matching in Stereoscopic Pair of Images


Title	Depth-Map Generation using Pixel Matching in Stereoscopic Pair of Images
Authors	Asra Aslam, Mohd. Samar Ansari
Abstract	Modern day multimedia content generation and dissemination is moving towards the presentation of more and more `realistic’ scenarios. The switch from 2-dimensional (2D) to 3-dimensional (3D) has been a major driving force in that direction. Over the recent past, a large number of approaches have been proposed for creating 3D images/videos most of which are based on the generation of depth-maps. This paper presents a new algorithm for obtaining depth information pertaining to a depicted scene from a set of available pair of stereoscopic images. The proposed algorithm performs a pixel-to-pixel matching of the two images in the stereo pair for estimation of depth. It is shown that the obtained depth-maps show improvements over the reported counterparts. \|
Tasks
Published	2019-02-09
URL	https://arxiv.org/abs/1902.03471v3
PDF	https://arxiv.org/pdf/1902.03471v3.pdf
PWC	https://paperswithcode.com/paper/depth-map-generation-using-pixel-matching-in
Repo
Framework

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR


Title	G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR
Authors	Duc Le, Thilo Koehler, Christian Fuegen, Michael L. Seltzer
Abstract	Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English. However, graphemic ASR still has problems with rare long-tail words that do not follow the standard spelling conventions seen in training, such as entity names. In this work, we present a novel method to train a statistical grapheme-to-grapheme (G2G) model on text-to-speech data that can rewrite an arbitrary character sequence into more phonetically consistent forms. We show that using G2G to provide alternative pronunciations during decoding reduces Word Error Rate by 3% to 11% relative over a strong graphemic baseline and bridges the gap on rare name recognition with an equivalent phonetic setup. Unlike many previously proposed methods, our method does not require any change to the acoustic model training procedure. This work reaffirms the efficacy of grapheme-based modeling and shows that specialized linguistic knowledge, when available, can be leveraged to improve graphemic ASR.
Tasks	Speech Recognition
Published	2019-10-22
URL	https://arxiv.org/abs/1910.12612v2
PDF	https://arxiv.org/pdf/1910.12612v2.pdf
PWC	https://paperswithcode.com/paper/g2g-tts-driven-pronunciation-learning-for
Repo
Framework

Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition


Title	Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition
Authors	Yiyi Zhang, Li Niu, Ziqi Pan, Meichao Luo, Jianfu Zhang, Dawei Cheng, Liqing Zhang
Abstract	Static image action recognition, which aims to recognize action based on a single image, usually relies on expensive human labeling effort such as adequate labeled action images and large-scale labeled image dataset. In contrast, abundant unlabeled videos can be economically obtained. Therefore, several works have explored using unlabeled videos to facilitate image action recognition, which can be categorized into the following two groups: (a) enhance visual representations of action images with a designed proxy task on unlabeled videos, which falls into the scope of self-supervised learning; (b) generate auxiliary representations for action images with the generator learned from unlabeled videos. In this paper, we integrate the above two strategies in a unified framework, which consists of Visual Representation Enhancement (VRE) module and Motion Representation Augmentation (MRA) module. Specifically, the VRE module includes a proxy task which imposes pseudo motion label constraint and temporal coherence constraint on unlabeled videos, while the MRA module could predict the motion information of a static action image by exploiting unlabeled videos. We demonstrate the superiority of our framework based on four benchmark human action datasets with limited labeled data.
Tasks
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00308v1
PDF	https://arxiv.org/pdf/1912.00308v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-motion-information-from-unlabeled
Repo
Framework

OpenEI: An Open Framework for Edge Intelligence


Title	OpenEI: An Open Framework for Edge Intelligence
Authors	Xingzhou Zhang, Yifan Wang, Sidi Lu, Liangkai Liu, Lanyu Xu, Weisong Shi
Abstract	In the last five years, edge computing has attracted tremendous attention from industry and academia due to its promise to reduce latency, save bandwidth, improve availability, and protect data privacy to keep data secure. At the same time, we have witnessed the proliferation of AI algorithms and models which accelerate the successful deployment of intelligence mainly in cloud services. These two trends, combined together, have created a new horizon: Edge Intelligence (EI). The development of EI requires much attention from both the computer systems research community and the AI community to meet these demands. However, existing computing techniques used in the cloud are not applicable to edge computing directly due to the diversity of computing sources and the distribution of data sources. We envision that there missing a framework that can be rapidly deployed on edge and enable edge AI capabilities. To address this challenge, in this paper we first present the definition and a systematic review of EI. Then, we introduce an Open Framework for Edge Intelligence (OpenEI), which is a lightweight software platform to equip edges with intelligent processing and data sharing capability. We analyze four fundamental EI techniques which are used to build OpenEI and identify several open problems based on potential research directions. Finally, four typical application scenarios enabled by OpenEI are presented.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01864v1
PDF	https://arxiv.org/pdf/1906.01864v1.pdf
PWC	https://paperswithcode.com/paper/openei-an-open-framework-for-edge
Repo
Framework

Task-oriented Design through Deep Reinforcement Learning


Title	Task-oriented Design through Deep Reinforcement Learning
Authors	Junyoung Choi, Minsung Hyun, Nojun Kwak
Abstract	We propose a new low-cost machine-learning-based methodology which assists designers in reducing the gap between the problem and the solution in the design process. Our work applies reinforcement learning (RL) to find the optimal task-oriented design solution through the construction of the design action for each task. For this task-oriented design, the 3D design process in product design is assigned to an action space in Deep RL, and the desired 3D model is obtained by training each design action according to the task. By showing that this method achieves satisfactory design even when applied to a task pursuing multiple goals, we suggest the direction of how machine learning can contribute to the design process. Also, we have validated with product designers that this methodology can assist the creative part in the process of design.
Tasks
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05271v1
PDF	http://arxiv.org/pdf/1903.05271v1.pdf
PWC	https://paperswithcode.com/paper/task-oriented-design-through-deep
Repo
Framework

A Conditional Perspective for Iterated Belief Contraction


Title	A Conditional Perspective for Iterated Belief Contraction
Authors	Kai Sauerwald, Gabriele Kern-Isberner, Christoph Beierle
Abstract	According to Boutillier, Darwiche, Pearl and others, principles for iterated revision can be characterised in terms of changing beliefs about conditionals. For iterated contraction a similar formulation is not known. This is especially because for iterated belief change the connection between revision and contraction via the Levi and Harper identity is not straightforward, and therefore, characterisation results do not transfer easily between iterated revision and contraction. In this article, we develop an axiomatisation of iterated contraction in terms of changing conditional beliefs. We prove that the new set of postulates conforms semantically to the class of operators like the ones given by Konieczny and Pino P'erez for iterated contraction.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08833v1
PDF	https://arxiv.org/pdf/1911.08833v1.pdf
PWC	https://paperswithcode.com/paper/a-conditional-perspective-for-iterated-belief
Repo
Framework

Sequential Convolutional Recurrent Neural Networks for Fast Automatic Modulation Classification


Title	Sequential Convolutional Recurrent Neural Networks for Fast Automatic Modulation Classification
Authors	kaisheng Liao, Guanhong Tao, Yi Zhong, Yaping Zhang, Zhenghong Zhang
Abstract	A novel and efficient end-to-end learning model for automatic modulation classification (AMC) is proposed for wireless spectrum monitoring applications, which automatically learns from the time domain in-phase and quadrature (IQ) data without requiring the design of hand-crafted expert features. With the intuition of convolutional layers with pooling serving as front-end feature distillation and dimensionality reduction, sequential convolutional recurrent neural networks (SCRNNs) are developed to take complementary advantage of parallel computing capability of convolutional neural networks (CNNs) and temporal sensitivity of recurrent neural networks (RNNs). Experimental results demonstrate that the proposed architecture delivers overall superior performance in signal to noise ratio (SNR) range above -10 dB, and achieves significantly improved classification accuracy from 80% to 92.1% at high SNRs, while drastically reduces the training and prediction time by approximately 74% and 67%, respectively. Furthermore, a comparative study is performed to investigate the impacts of various SCRNN structure settings on classification performance. A representative SCRNN architecture with the two-layer CNN and subsequent two-layer long short-term memory (LSTM) is developed to suggest the option for fast AMC.
Tasks	Dimensionality Reduction
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03050v1
PDF	https://arxiv.org/pdf/1909.03050v1.pdf
PWC	https://paperswithcode.com/paper/sequential-convolutional-recurrent-neural
Repo
Framework

Approximating exponential family models (not single distributions) with a two-network architecture


Title	Approximating exponential family models (not single distributions) with a two-network architecture
Authors	Sean R. Bittner, John P. Cunningham
Abstract	Recently much attention has been paid to deep generative models, since they have been used to great success for variational inference, generation of complex data types, and more. In most all of these settings, the goal has been to find a particular member of that model family: optimized parameters index a distribution that is close (via a divergence or classification metric) to a target distribution. Much less attention, however, has been paid to the problem of learning a model itself. Here we introduce a two-network architecture and optimization procedure for learning intractable exponential family models (not a single distribution from those models). These exponential families are learned accurately, allowing operations like posterior inference to be executed directly and generically with an input choice of natural parameters, rather than performing inference via optimization for each particular distribution within that model.
Tasks
Published	2019-03-18
URL	http://arxiv.org/abs/1903.07515v1
PDF	http://arxiv.org/pdf/1903.07515v1.pdf
PWC	https://paperswithcode.com/paper/approximating-exponential-family-models-not
Repo
Framework


Title	Variational Langevin Hamiltonian Monte Carlo for Distant Multi-modal Sampling
Authors	Minghao Gu, Shiliang Sun
Abstract	The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state space much more efficiently than the random-walk proposals. However, probabilistic inference involving multi-modal distributions is very difficult for standard HMC method, especially when the modes are far away from each other. Sampling algorithms are then often incapable of traveling across the places of low probability. In this paper, we propose a novel MCMC algorithm which aims to sample from multi-modal distributions effectively. The method improves Hamiltonian dynamics to reduce the autocorrelation of the samples and uses a variational distribution to explore the phase space and find new modes. A formal proof is provided which shows that the proposed method can converge to target distributions. Both synthetic and real datasets are used to evaluate its properties and performance. The experimental results verify the theory and show superior performance in multi-modal sampling.
Tasks
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00229v1
PDF	https://arxiv.org/pdf/1906.00229v1.pdf
PWC	https://paperswithcode.com/paper/190600229
Repo
Framework

On Online Learning in Kernelized Markov Decision Processes


Title	On Online Learning in Kernelized Markov Decision Processes
Authors	Sayak Ray Chowdhury, Aditya Gopalan
Abstract	We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS).
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01871v1
PDF	https://arxiv.org/pdf/1911.01871v1.pdf
PWC	https://paperswithcode.com/paper/on-online-learning-in-kernelized-markov
Repo
Framework

PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation


Title	PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation
Authors	Na Zhao, Tat-Seng Chua, Gim Hee Lee
Abstract	In this paper, we present the PS^2-Net – a locally and globally aware deep learning framework for semantic segmentation on 3D scene-level point clouds. In order to deeply incorporate local structures and global context to support 3D scene segmentation, our network is built on four repeatedly stacked encoders, where each encoder has two basic components: EdgeConv that captures local structures and NetVLAD that models global context. Different from existing start-of-the-art methods for point-based scene semantic segmentation that either violate or do not achieve permutation invariance, our PS^2-Net is designed to be permutation invariant which is an essential property of any deep network used to process unordered point clouds. We further provide theoretical proof to guarantee the permutation invariance property of our network. We perform extensive experiments on two large-scale 3D indoor scene datasets and demonstrate that our PS2-Net is able to achieve state-of-the-art performances as compared to existing approaches.
Tasks	Scene Segmentation, Semantic Segmentation
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05425v1
PDF	https://arxiv.org/pdf/1908.05425v1.pdf
PWC	https://paperswithcode.com/paper/ps2-net-a-locally-and-globally-aware-network
Repo
Framework

IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things


Title	IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things
Authors	Cheng-Yang Fu, Tamara L. Berg, Alexander C. Berg
Abstract	In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted Instance Segmentation as a new feature for semantic segmentation. It also supports back propagation so is trainable end-to-end. Our experiments show the effectiveness of IMP on both Clothing Parsing (with complex layering, large deformations, and non-convex objects), and on Street Scene Segmentation (with many overlapping instances and small objects). On the Varied Clothing Parsing dataset (VCP), we show instance mask projection can improve 3 points on mIOU from a state-of-the-art Panoptic FPN segmentation approach. On the ModaNet clothing parsing dataset, we show a dramatic improvement of 20.4% absolutely compared to existing baseline semantic segmentation results. In addition, the instance mask projection operator works well on other (non-clothing) datasets, providing an improvement of 3 points in mIOU on Thing classes of Cityscapes, a self-driving dataset, on top of a state-of-the-art approach.
Tasks	Instance Segmentation, Scene Segmentation, Semantic Segmentation
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06597v1
PDF	https://arxiv.org/pdf/1906.06597v1.pdf
PWC	https://paperswithcode.com/paper/imp-instance-mask-projection-for-high
Repo
Framework

Extracting localized information from a Twitter corpus for flood prevention


Title	Extracting localized information from a Twitter corpus for flood prevention
Authors	Etienne Brangbour, Pierrick Bruneau, Stéphane Marchand-Maillet, Renaud Hostache, Patrick Matgen, Marco Chini, Thomas Tamisier
Abstract	In this paper, we discuss the collection of a corpus associated to tropical storm Harvey, as well as its analysis from both spatial and topical perspectives. From the spatial perspective, our goal here is to get a first estimation of the quality and precision of the geographical information featured in the collected corpus. From a topical perspective, we discuss the representation of Twitter posts, and strategies to process an initially unlabeled corpus of tweets.
Tasks
Published	2019-03-12
URL	https://arxiv.org/abs/1903.04748v2
PDF	https://arxiv.org/pdf/1903.04748v2.pdf
PWC	https://paperswithcode.com/paper/extracting-localized-information-from-a
Repo
Framework

UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers


Title	UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers
Authors	Rafael Munoz-Salinas, Rafael Medina-Carnicer
Abstract	This paper proposes a novel approach for Simultaneous Localization and Mapping by fusing natural and artificial landmarks. Most of the SLAM approaches use natural landmarks (such as keypoints). However, they are unstable over time, repetitive in many cases or insufficient for a robust tracking (e.g. in indoor buildings). On the other hand, other approaches have employed artificial landmarks (such as squared fiducial markers) placed in the environment to help tracking and relocalization. We propose a method that integrates both approaches in order to achieve long-term robust tracking in many scenarios. Our method has been compared to the start-of-the-art methods ORB-SLAM2 and LDSO in the public dataset Kitti, Euroc-MAV, TUM and SPM, obtaining better precision, robustness and speed. Our tests also show that the combination of markers and keypoints achieves better accuracy than each one of them independently.
Tasks	Simultaneous Localization and Mapping
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03729v1
PDF	http://arxiv.org/pdf/1902.03729v1.pdf
PWC	https://paperswithcode.com/paper/ucoslam-simultaneous-localization-and-mapping
Repo
Framework