Paper Group ANR 1045
Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans. Depth-Map Generation using Pixel Matching in Stereoscopic Pair of Images. G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR. Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition. OpenEI: An Open Framework …
Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans
Title | Quantitative Impact of Label Noise on the Quality of Segmentation of Brain Tumors on MRI scans |
Authors | Michał Marcinkiewicz, Grzegorz Mrukwa |
Abstract | Over the last few years, deep learning has proven to be a great solution to many problems, such as image or text classification. Recently, deep learning-based solutions have outperformed humans on selected benchmark datasets, yielding a promising future for scientific and real-world applications. Training of deep learning models requires vast amounts of high quality data to achieve such supreme performance. In real-world scenarios, obtaining a large, coherent, and properly labeled dataset is a challenging task. This is especially true in medical applications, where high-quality data and annotations are scarce and the number of expert annotators is limited. In this paper, we investigate the impact of corrupted ground-truth masks on the performance of a neural network for a brain tumor segmentation task. Our findings suggest that a) the performance degrades about 8% less than it could be expected from simulations, b) a neural network learns the simulated biases of annotators, c) biases can be partially mitigated by using an inversely-biased dice loss function. |
Tasks | Brain Tumor Segmentation, Text Classification |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08959v1 |
https://arxiv.org/pdf/1909.08959v1.pdf | |
PWC | https://paperswithcode.com/paper/quantitative-impact-of-label-noise-on-the |
Repo | |
Framework | |
Depth-Map Generation using Pixel Matching in Stereoscopic Pair of Images
Title | Depth-Map Generation using Pixel Matching in Stereoscopic Pair of Images |
Authors | Asra Aslam, Mohd. Samar Ansari |
Abstract | Modern day multimedia content generation and dissemination is moving towards the presentation of more and more `realistic’ scenarios. The switch from 2-dimensional (2D) to 3-dimensional (3D) has been a major driving force in that direction. Over the recent past, a large number of approaches have been proposed for creating 3D images/videos most of which are based on the generation of depth-maps. This paper presents a new algorithm for obtaining depth information pertaining to a depicted scene from a set of available pair of stereoscopic images. The proposed algorithm performs a pixel-to-pixel matching of the two images in the stereo pair for estimation of depth. It is shown that the obtained depth-maps show improvements over the reported counterparts. | |
Tasks | |
Published | 2019-02-09 |
URL | https://arxiv.org/abs/1902.03471v3 |
https://arxiv.org/pdf/1902.03471v3.pdf | |
PWC | https://paperswithcode.com/paper/depth-map-generation-using-pixel-matching-in |
Repo | |
Framework | |
G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR
Title | G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR |
Authors | Duc Le, Thilo Koehler, Christian Fuegen, Michael L. Seltzer |
Abstract | Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English. However, graphemic ASR still has problems with rare long-tail words that do not follow the standard spelling conventions seen in training, such as entity names. In this work, we present a novel method to train a statistical grapheme-to-grapheme (G2G) model on text-to-speech data that can rewrite an arbitrary character sequence into more phonetically consistent forms. We show that using G2G to provide alternative pronunciations during decoding reduces Word Error Rate by 3% to 11% relative over a strong graphemic baseline and bridges the gap on rare name recognition with an equivalent phonetic setup. Unlike many previously proposed methods, our method does not require any change to the acoustic model training procedure. This work reaffirms the efficacy of grapheme-based modeling and shows that specialized linguistic knowledge, when available, can be leveraged to improve graphemic ASR. |
Tasks | Speech Recognition |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.12612v2 |
https://arxiv.org/pdf/1910.12612v2.pdf | |
PWC | https://paperswithcode.com/paper/g2g-tts-driven-pronunciation-learning-for |
Repo | |
Framework | |
Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition
Title | Exploiting Motion Information from Unlabeled Videos for Static Image Action Recognition |
Authors | Yiyi Zhang, Li Niu, Ziqi Pan, Meichao Luo, Jianfu Zhang, Dawei Cheng, Liqing Zhang |
Abstract | Static image action recognition, which aims to recognize action based on a single image, usually relies on expensive human labeling effort such as adequate labeled action images and large-scale labeled image dataset. In contrast, abundant unlabeled videos can be economically obtained. Therefore, several works have explored using unlabeled videos to facilitate image action recognition, which can be categorized into the following two groups: (a) enhance visual representations of action images with a designed proxy task on unlabeled videos, which falls into the scope of self-supervised learning; (b) generate auxiliary representations for action images with the generator learned from unlabeled videos. In this paper, we integrate the above two strategies in a unified framework, which consists of Visual Representation Enhancement (VRE) module and Motion Representation Augmentation (MRA) module. Specifically, the VRE module includes a proxy task which imposes pseudo motion label constraint and temporal coherence constraint on unlabeled videos, while the MRA module could predict the motion information of a static action image by exploiting unlabeled videos. We demonstrate the superiority of our framework based on four benchmark human action datasets with limited labeled data. |
Tasks | |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00308v1 |
https://arxiv.org/pdf/1912.00308v1.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-motion-information-from-unlabeled |
Repo | |
Framework | |
OpenEI: An Open Framework for Edge Intelligence
Title | OpenEI: An Open Framework for Edge Intelligence |
Authors | Xingzhou Zhang, Yifan Wang, Sidi Lu, Liangkai Liu, Lanyu Xu, Weisong Shi |
Abstract | In the last five years, edge computing has attracted tremendous attention from industry and academia due to its promise to reduce latency, save bandwidth, improve availability, and protect data privacy to keep data secure. At the same time, we have witnessed the proliferation of AI algorithms and models which accelerate the successful deployment of intelligence mainly in cloud services. These two trends, combined together, have created a new horizon: Edge Intelligence (EI). The development of EI requires much attention from both the computer systems research community and the AI community to meet these demands. However, existing computing techniques used in the cloud are not applicable to edge computing directly due to the diversity of computing sources and the distribution of data sources. We envision that there missing a framework that can be rapidly deployed on edge and enable edge AI capabilities. To address this challenge, in this paper we first present the definition and a systematic review of EI. Then, we introduce an Open Framework for Edge Intelligence (OpenEI), which is a lightweight software platform to equip edges with intelligent processing and data sharing capability. We analyze four fundamental EI techniques which are used to build OpenEI and identify several open problems based on potential research directions. Finally, four typical application scenarios enabled by OpenEI are presented. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01864v1 |
https://arxiv.org/pdf/1906.01864v1.pdf | |
PWC | https://paperswithcode.com/paper/openei-an-open-framework-for-edge |
Repo | |
Framework | |
Task-oriented Design through Deep Reinforcement Learning
Title | Task-oriented Design through Deep Reinforcement Learning |
Authors | Junyoung Choi, Minsung Hyun, Nojun Kwak |
Abstract | We propose a new low-cost machine-learning-based methodology which assists designers in reducing the gap between the problem and the solution in the design process. Our work applies reinforcement learning (RL) to find the optimal task-oriented design solution through the construction of the design action for each task. For this task-oriented design, the 3D design process in product design is assigned to an action space in Deep RL, and the desired 3D model is obtained by training each design action according to the task. By showing that this method achieves satisfactory design even when applied to a task pursuing multiple goals, we suggest the direction of how machine learning can contribute to the design process. Also, we have validated with product designers that this methodology can assist the creative part in the process of design. |
Tasks | |
Published | 2019-03-13 |
URL | http://arxiv.org/abs/1903.05271v1 |
http://arxiv.org/pdf/1903.05271v1.pdf | |
PWC | https://paperswithcode.com/paper/task-oriented-design-through-deep |
Repo | |
Framework | |
A Conditional Perspective for Iterated Belief Contraction
Title | A Conditional Perspective for Iterated Belief Contraction |
Authors | Kai Sauerwald, Gabriele Kern-Isberner, Christoph Beierle |
Abstract | According to Boutillier, Darwiche, Pearl and others, principles for iterated revision can be characterised in terms of changing beliefs about conditionals. For iterated contraction a similar formulation is not known. This is especially because for iterated belief change the connection between revision and contraction via the Levi and Harper identity is not straightforward, and therefore, characterisation results do not transfer easily between iterated revision and contraction. In this article, we develop an axiomatisation of iterated contraction in terms of changing conditional beliefs. We prove that the new set of postulates conforms semantically to the class of operators like the ones given by Konieczny and Pino P'erez for iterated contraction. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08833v1 |
https://arxiv.org/pdf/1911.08833v1.pdf | |
PWC | https://paperswithcode.com/paper/a-conditional-perspective-for-iterated-belief |
Repo | |
Framework | |
Sequential Convolutional Recurrent Neural Networks for Fast Automatic Modulation Classification
Title | Sequential Convolutional Recurrent Neural Networks for Fast Automatic Modulation Classification |
Authors | kaisheng Liao, Guanhong Tao, Yi Zhong, Yaping Zhang, Zhenghong Zhang |
Abstract | A novel and efficient end-to-end learning model for automatic modulation classification (AMC) is proposed for wireless spectrum monitoring applications, which automatically learns from the time domain in-phase and quadrature (IQ) data without requiring the design of hand-crafted expert features. With the intuition of convolutional layers with pooling serving as front-end feature distillation and dimensionality reduction, sequential convolutional recurrent neural networks (SCRNNs) are developed to take complementary advantage of parallel computing capability of convolutional neural networks (CNNs) and temporal sensitivity of recurrent neural networks (RNNs). Experimental results demonstrate that the proposed architecture delivers overall superior performance in signal to noise ratio (SNR) range above -10 dB, and achieves significantly improved classification accuracy from 80% to 92.1% at high SNRs, while drastically reduces the training and prediction time by approximately 74% and 67%, respectively. Furthermore, a comparative study is performed to investigate the impacts of various SCRNN structure settings on classification performance. A representative SCRNN architecture with the two-layer CNN and subsequent two-layer long short-term memory (LSTM) is developed to suggest the option for fast AMC. |
Tasks | Dimensionality Reduction |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03050v1 |
https://arxiv.org/pdf/1909.03050v1.pdf | |
PWC | https://paperswithcode.com/paper/sequential-convolutional-recurrent-neural |
Repo | |
Framework | |
Approximating exponential family models (not single distributions) with a two-network architecture
Title | Approximating exponential family models (not single distributions) with a two-network architecture |
Authors | Sean R. Bittner, John P. Cunningham |
Abstract | Recently much attention has been paid to deep generative models, since they have been used to great success for variational inference, generation of complex data types, and more. In most all of these settings, the goal has been to find a particular member of that model family: optimized parameters index a distribution that is close (via a divergence or classification metric) to a target distribution. Much less attention, however, has been paid to the problem of learning a model itself. Here we introduce a two-network architecture and optimization procedure for learning intractable exponential family models (not a single distribution from those models). These exponential families are learned accurately, allowing operations like posterior inference to be executed directly and generically with an input choice of natural parameters, rather than performing inference via optimization for each particular distribution within that model. |
Tasks | |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07515v1 |
http://arxiv.org/pdf/1903.07515v1.pdf | |
PWC | https://paperswithcode.com/paper/approximating-exponential-family-models-not |
Repo | |
Framework | |
Variational Langevin Hamiltonian Monte Carlo for Distant Multi-modal Sampling
Title | Variational Langevin Hamiltonian Monte Carlo for Distant Multi-modal Sampling |
Authors | Minghao Gu, Shiliang Sun |
Abstract | The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state space much more efficiently than the random-walk proposals. However, probabilistic inference involving multi-modal distributions is very difficult for standard HMC method, especially when the modes are far away from each other. Sampling algorithms are then often incapable of traveling across the places of low probability. In this paper, we propose a novel MCMC algorithm which aims to sample from multi-modal distributions effectively. The method improves Hamiltonian dynamics to reduce the autocorrelation of the samples and uses a variational distribution to explore the phase space and find new modes. A formal proof is provided which shows that the proposed method can converge to target distributions. Both synthetic and real datasets are used to evaluate its properties and performance. The experimental results verify the theory and show superior performance in multi-modal sampling. |
Tasks | |
Published | 2019-06-01 |
URL | https://arxiv.org/abs/1906.00229v1 |
https://arxiv.org/pdf/1906.00229v1.pdf | |
PWC | https://paperswithcode.com/paper/190600229 |
Repo | |
Framework | |
On Online Learning in Kernelized Markov Decision Processes
Title | On Online Learning in Kernelized Markov Decision Processes |
Authors | Sayak Ray Chowdhury, Aditya Gopalan |
Abstract | We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS). |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01871v1 |
https://arxiv.org/pdf/1911.01871v1.pdf | |
PWC | https://paperswithcode.com/paper/on-online-learning-in-kernelized-markov |
Repo | |
Framework | |
PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation
Title | PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation |
Authors | Na Zhao, Tat-Seng Chua, Gim Hee Lee |
Abstract | In this paper, we present the PS^2-Net – a locally and globally aware deep learning framework for semantic segmentation on 3D scene-level point clouds. In order to deeply incorporate local structures and global context to support 3D scene segmentation, our network is built on four repeatedly stacked encoders, where each encoder has two basic components: EdgeConv that captures local structures and NetVLAD that models global context. Different from existing start-of-the-art methods for point-based scene semantic segmentation that either violate or do not achieve permutation invariance, our PS^2-Net is designed to be permutation invariant which is an essential property of any deep network used to process unordered point clouds. We further provide theoretical proof to guarantee the permutation invariance property of our network. We perform extensive experiments on two large-scale 3D indoor scene datasets and demonstrate that our PS2-Net is able to achieve state-of-the-art performances as compared to existing approaches. |
Tasks | Scene Segmentation, Semantic Segmentation |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05425v1 |
https://arxiv.org/pdf/1908.05425v1.pdf | |
PWC | https://paperswithcode.com/paper/ps2-net-a-locally-and-globally-aware-network |
Repo | |
Framework | |
IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things
Title | IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things |
Authors | Cheng-Yang Fu, Tamara L. Berg, Alexander C. Berg |
Abstract | In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted Instance Segmentation as a new feature for semantic segmentation. It also supports back propagation so is trainable end-to-end. Our experiments show the effectiveness of IMP on both Clothing Parsing (with complex layering, large deformations, and non-convex objects), and on Street Scene Segmentation (with many overlapping instances and small objects). On the Varied Clothing Parsing dataset (VCP), we show instance mask projection can improve 3 points on mIOU from a state-of-the-art Panoptic FPN segmentation approach. On the ModaNet clothing parsing dataset, we show a dramatic improvement of 20.4% absolutely compared to existing baseline semantic segmentation results. In addition, the instance mask projection operator works well on other (non-clothing) datasets, providing an improvement of 3 points in mIOU on Thing classes of Cityscapes, a self-driving dataset, on top of a state-of-the-art approach. |
Tasks | Instance Segmentation, Scene Segmentation, Semantic Segmentation |
Published | 2019-06-15 |
URL | https://arxiv.org/abs/1906.06597v1 |
https://arxiv.org/pdf/1906.06597v1.pdf | |
PWC | https://paperswithcode.com/paper/imp-instance-mask-projection-for-high |
Repo | |
Framework | |
Extracting localized information from a Twitter corpus for flood prevention
Title | Extracting localized information from a Twitter corpus for flood prevention |
Authors | Etienne Brangbour, Pierrick Bruneau, Stéphane Marchand-Maillet, Renaud Hostache, Patrick Matgen, Marco Chini, Thomas Tamisier |
Abstract | In this paper, we discuss the collection of a corpus associated to tropical storm Harvey, as well as its analysis from both spatial and topical perspectives. From the spatial perspective, our goal here is to get a first estimation of the quality and precision of the geographical information featured in the collected corpus. From a topical perspective, we discuss the representation of Twitter posts, and strategies to process an initially unlabeled corpus of tweets. |
Tasks | |
Published | 2019-03-12 |
URL | https://arxiv.org/abs/1903.04748v2 |
https://arxiv.org/pdf/1903.04748v2.pdf | |
PWC | https://paperswithcode.com/paper/extracting-localized-information-from-a |
Repo | |
Framework | |
UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers
Title | UcoSLAM: Simultaneous Localization and Mapping by Fusion of KeyPoints and Squared Planar Markers |
Authors | Rafael Munoz-Salinas, Rafael Medina-Carnicer |
Abstract | This paper proposes a novel approach for Simultaneous Localization and Mapping by fusing natural and artificial landmarks. Most of the SLAM approaches use natural landmarks (such as keypoints). However, they are unstable over time, repetitive in many cases or insufficient for a robust tracking (e.g. in indoor buildings). On the other hand, other approaches have employed artificial landmarks (such as squared fiducial markers) placed in the environment to help tracking and relocalization. We propose a method that integrates both approaches in order to achieve long-term robust tracking in many scenarios. Our method has been compared to the start-of-the-art methods ORB-SLAM2 and LDSO in the public dataset Kitti, Euroc-MAV, TUM and SPM, obtaining better precision, robustness and speed. Our tests also show that the combination of markers and keypoints achieves better accuracy than each one of them independently. |
Tasks | Simultaneous Localization and Mapping |
Published | 2019-02-11 |
URL | http://arxiv.org/abs/1902.03729v1 |
http://arxiv.org/pdf/1902.03729v1.pdf | |
PWC | https://paperswithcode.com/paper/ucoslam-simultaneous-localization-and-mapping |
Repo | |
Framework | |