Paper Group ANR 1559
RoIMix: Proposal-Fusion among Multiple Images for Underwater Object Detection. Density Matrices with metric for Derivational Ambiguity. Path-Restore: Learning Network Path Selection for Image Restoration. PROPS: Probabilistic personalization of black-box sequence models. Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Image …
RoIMix: Proposal-Fusion among Multiple Images for Underwater Object Detection
Title | RoIMix: Proposal-Fusion among Multiple Images for Underwater Object Detection |
Authors | Wei-Hong Lin, Jia-Xing Zhong, Shan Liu, Thomas Li, Ge Li |
Abstract | Generic object detection algorithms have proven their excellent performance in recent years. However, object detection on underwater datasets is still less explored. In contrast to generic datasets, underwater images usually have color shift and low contrast; sediment would cause blurring in underwater images. In addition, underwater creatures often appear closely to each other on images due to their living habits. To address these issues, our work investigates augmentation policies to simulate overlapping, occluded and blurred objects, and we construct a model capable of achieving better generalization. We propose an augmentation method called RoIMix, which characterizes interactions among images. Proposals extracted from different images are mixed together. Previous data augmentation methods operate on a single image while we apply RoIMix to multiple images to create enhanced samples as training data. Experiments show that our proposed method improves the performance of region-based object detectors on both Pascal VOC and URPC datasets. |
Tasks | Data Augmentation, Object Detection |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03029v2 |
https://arxiv.org/pdf/1911.03029v2.pdf | |
PWC | https://paperswithcode.com/paper/roimix-proposal-fusion-among-multiple-images |
Repo | |
Framework | |
Density Matrices with metric for Derivational Ambiguity
Title | Density Matrices with metric for Derivational Ambiguity |
Authors | A. D. Correia, M. Moortgat, H. T. C. Stoof |
Abstract | Recent work on vector-based compositional natural language semantics has proposed the use of density matrices to model lexical ambiguity and (graded) entailment (e.g. Piedeleu et al 2015, Bankova et al 2019, Sadrzadeh et al 2018). Ambiguous word meanings, in this work, are represented as mixed states, and the compositional interpretation of phrases out of their constituent parts takes the form of a strongly monoidal functor sending the derivational morphisms of a pregroup syntax to linear maps in FdHilb. Our aims in this paper are threefold. Firstly, we replace the pregroup front end by a Lambek categorial grammar with directional implications expressing a word’s selectional requirements. By the Curry-Howard correspondence, the derivations of the grammar’s type logic are associated with terms of the (ordered) linear lambda calculus; these terms can be read as programs for compositional meaning assembly with density matrices as the target semantic spaces. Secondly, we extend on the existing literature and introduce a symmetric, nondegenerate bilinear form called a “metric” that defines a canonical isomorphism between a vector space and its dual, allowing us to keep a distinction between left and right implication. Thirdly, we use this metric to define density matrix spaces in a directional form, modeling the ubiquitous derivational ambiguity of natural language syntax, and show how this alows an integrated treatment of lexical and derivational forms of ambiguity controlled at the level of the interpretation. |
Tasks | |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07347v2 |
https://arxiv.org/pdf/1908.07347v2.pdf | |
PWC | https://paperswithcode.com/paper/density-matrices-for-derivational-ambiguity |
Repo | |
Framework | |
Path-Restore: Learning Network Path Selection for Image Restoration
Title | Path-Restore: Learning Network Path Selection for Image Restoration |
Authors | Ke Yu, Xintao Wang, Chao Dong, Xiaoou Tang, Chen Change Loy |
Abstract | Very deep Convolutional Neural Networks (CNNs) have greatly improved the performance on various image restoration tasks. However, this comes at a price of increasing computational burden, which limits their practical usages. We believe that some corrupted image regions are inherently easier to restore than others since the distortion and content vary within an image. To this end, we propose Path-Restore, a multi-path CNN with a pathfinder that could dynamically select an appropriate route for each image region. We train the pathfinder using reinforcement learning with a difficulty-regulated reward, which is related to the performance, complexity and “the difficulty of restoring a region”. We conduct experiments on denoising and mixed restoration tasks. The results show that our method could achieve comparable or superior performance to existing approaches with less computational cost. In particular, our method is effective for real-world denoising, where the noise distribution varies across different regions of a single image. We surpass the state-of-the-art CBDNet by 0.94 dB and run 29% faster on the realistic Darmstadt Noise Dataset. Models and codes will be released. |
Tasks | Denoising, Image Restoration |
Published | 2019-04-23 |
URL | http://arxiv.org/abs/1904.10343v1 |
http://arxiv.org/pdf/1904.10343v1.pdf | |
PWC | https://paperswithcode.com/paper/path-restore-learning-network-path-selection |
Repo | |
Framework | |
PROPS: Probabilistic personalization of black-box sequence models
Title | PROPS: Probabilistic personalization of black-box sequence models |
Authors | Michael Thomas Wojnowicz, Xuan Zhao |
Abstract | We present PROPS, a lightweight transfer learning mechanism for sequential data. PROPS learns probabilistic perturbations around the predictions of one or more arbitrarily complex, pre-trained black box models (such as recurrent neural networks). The technique pins the black-box prediction functions to “source nodes” of a hidden Markov model (HMM), and uses the remaining nodes as “perturbation nodes” for learning customized perturbations around those predictions. In this paper, we describe the PROPS model, provide an algorithm for online learning of its parameters, and demonstrate the consistency of this estimation. We also explore the utility of PROPS in the context of personalized language modeling. In particular, we construct a baseline language model by training a LSTM on the entire Wikipedia corpus of 2.5 million articles (around 6.6 billion words), and then use PROPS to provide lightweight customization into a personalized language model of President Donald J. Trump’s tweeting. We achieved good customization after only 2,000 additional words, and find that the PROPS model, being fully probabilistic, provides insight into when President Trump’s speech departs from generic patterns in the Wikipedia corpus. Python code (for both the PROPS training algorithm as well as experiment reproducibility) is available at https://github.com/cylance/perturbed-sequence-model. |
Tasks | Language Modelling, Transfer Learning |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.02013v1 |
http://arxiv.org/pdf/1903.02013v1.pdf | |
PWC | https://paperswithcode.com/paper/props-probabilistic-personalization-of-black |
Repo | |
Framework | |
Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images
Title | Learning Shape Priors for Robust Cardiac MR Segmentation from Multi-view Images |
Authors | Chen Chen, Carlo Biffi, Giacomo Tarroni, Steffen Petersen, Wenjia Bai, Daniel Rueckert |
Abstract | Cardiac MR image segmentation is essential for the morphological and functional analysis of the heart. Inspired by how experienced clinicians assess the cardiac morphology and function across multiple standard views (i.e. long- and short-axis views), we propose a novel approach which learns anatomical shape priors across different 2D standard views and leverages these priors to segment the left ventricular (LV) myocardium from short-axis MR image stacks. The proposed segmentation method has the advantage of being a 2D network but at the same time incorporates spatial context from multiple, complementary views that span a 3D space. Our method achieves accurate and robust segmentation of the myocardium across different short-axis slices (from apex to base), outperforming baseline models (e.g. 2D U-Net, 3D U-Net) while achieving higher data efficiency. Compared to the 2D U-Net, the proposed method reduces the mean Hausdorff distance (mm) from 3.24 to 2.49 on the apical slices, from 2.34 to 2.09 on the middle slices and from 3.62 to 2.76 on the basal slices on the test set, when only 10% of the training data was used. |
Tasks | Semantic Segmentation |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09983v2 |
https://arxiv.org/pdf/1907.09983v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-shape-priors-for-robust-cardiac-mr |
Repo | |
Framework | |
CSSegNet: Fine-Grained Cardiac Structures Segmentation Using Dilated Pyramid Pooling in U-net
Title | CSSegNet: Fine-Grained Cardiac Structures Segmentation Using Dilated Pyramid Pooling in U-net |
Authors | Fei Feng, Jiajia Luo |
Abstract | Cardiac structure segmentation plays an important role in medical analysis procedures. Images’ blurred boundaries issue always limits the segmentation performance. To address this difficult problem, we presented a novel network structure which embedded dilated pyramid pooling block in the skip connections between networks’ encoding and decoding stage. A dilated pyramid pooling block is made up of convolutions and pooling operations with different vision scopes. Equipped the model with such module, it could be endowed with multi-scales vision ability. Together combining with other techniques, it included a multi-scales initial features extraction and a multi-resolutions’ prediction aggregation module. As for backbone feature extraction network, we referred to the basic idea of Xception network which benefited from separable convolutions. Evaluated on the Post 2017 MICCAI-ACDC challenge phase data, our proposed model could achieve state-of-the-art performance in left ventricle (LVC) cavities and right ventricle cavities (RVC) segmentation tasks. Results revealed that our method has advantages on both geometrical (Dice coefficient, Hausdorff distance) and clinical evaluation (Ejection Fraction, Volume), which represent closer boundaries and more statistically significant separately. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01390v1 |
https://arxiv.org/pdf/1907.01390v1.pdf | |
PWC | https://paperswithcode.com/paper/cssegnet-fine-grained-cardiac-structures |
Repo | |
Framework | |
Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus
Title | Detecting dementia in Mandarin Chinese using transfer learning from a parallel corpus |
Authors | Bai Li, Yi-Te Hsu, Frank Rudzicz |
Abstract | Machine learning has shown promise for automatic detection of Alzheimer’s disease (AD) through speech; however, efforts are hampered by a scarcity of data, especially in languages other than English. We propose a method to learn a correspondence between independently engineered lexicosyntactic features in two languages, using a large parallel corpus of out-of-domain movie dialogue data. We apply it to dementia detection in Mandarin Chinese, and demonstrate that our method outperforms both unilingual and machine translation-based baselines. This appears to be the first study that transfers feature domains in detecting cognitive decline. |
Tasks | Machine Translation, Transfer Learning |
Published | 2019-03-03 |
URL | https://arxiv.org/abs/1903.00933v2 |
https://arxiv.org/pdf/1903.00933v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-dementia-in-mandarin-chinese-using |
Repo | |
Framework | |
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models
Title | Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models |
Authors | Thomas Drugman, Janne Pylkkonen, Reinhard Kneser |
Abstract | The goal of this paper is to simulate the benefits of jointly applying active learning (AL) and semi-supervised training (SST) in a new speech recognition application. Our data selection approach relies on confidence filtering, and its impact on both the acoustic and language models (AM and LM) is studied. While AL is known to be beneficial to AM training, we show that it also carries out substantial improvements to the LM when combined with SST. Sophisticated confidence models, on the other hand, did not prove to yield any data selection gain. Our results indicate that, while SST is crucial at the beginning of the labeling process, its gains degrade rapidly as AL is set in place. The final simulation reports that AL allows a transcription cost reduction of about 70% over random selection. Alternatively, for a fixed transcription budget, the proposed approach improves the word error rate by about 12.5% relative. |
Tasks | Active Learning, Speech Recognition |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.02852v1 |
http://arxiv.org/pdf/1903.02852v1.pdf | |
PWC | https://paperswithcode.com/paper/active-and-semi-supervised-learning-in-asr |
Repo | |
Framework | |
Probabilistic Filtered Soft Labels for Domain Adaptation
Title | Probabilistic Filtered Soft Labels for Domain Adaptation |
Authors | Wei Wang, Zhihui Wang, Haojie Li, Zhengming Ding |
Abstract | Many domain adaptation (DA) methods aim to project the source and target domains into a common feature space, where the inter-domain distributional differences are reduced and some intra-domain properties preserved. Recent research obtains their respective new representations using some predefined statistics. However, they usually formulate the class-wise statistics using the pseudo hard labels due to no labeled target data, such as class-wise MMD and class scatter matrice. The probabilities of data points belonging to each class given by the hard labels are either 0 or 1, while the soft labels could relax the strong constraint of hard labels and provide a random value between them. Although existing work have noticed the advantage of soft labels, they either deal with thoes class-wise statistics inadequately or introduce those small irrelevant probabilities in soft labels. Therefore, we propose the filtered soft labels to discard thoes confusing probabilities, then both of the class-wise MMD and class scatter matrice are modeled in this way. In order to obtain more accurate filtered soft labels, we take advantage of a well-designed Graph-based Label Propagation (GLP) method, and incorporate it into the DA procedure to formulate a unified framework. |
Tasks | Domain Adaptation |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.12209v1 |
https://arxiv.org/pdf/1912.12209v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-filtered-soft-labels-for-domain |
Repo | |
Framework | |
The Past and the Present of the Color Checker Dataset Misuse
Title | The Past and the Present of the Color Checker Dataset Misuse |
Authors | Nikola Banić, Karlo Koš{č}ević, Marko Subašić, Sven Lon{č}arić |
Abstract | The pipelines of digital cameras contain a part for computational color constancy, which aims to remove the influence of the illumination on the scene colors. One of the best known and most widely used benchmark datasets for this problem is the Color Checker dataset. However, due to the improper handling of the black level in its images, this dataset has been widely misused and while some recent publications tried to alleviate the problem, they nevertheless erred and created additional wrong data. This paper gives a history of the Color Checker dataset usage, it describes the origins and reasons for its misuses, and it explains the old and new mistakes introduced in the most recent publications that tried to handle the issue. This should, hopefully, help to prevent similar future misuses. |
Tasks | Color Constancy |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04473v1 |
http://arxiv.org/pdf/1903.04473v1.pdf | |
PWC | https://paperswithcode.com/paper/the-past-and-the-present-of-the-color-checker |
Repo | |
Framework | |
An End-to-End Joint Unsupervised Learning of Deep Model and Pseudo-Classes for Remote Sensing Scene Representation
Title | An End-to-End Joint Unsupervised Learning of Deep Model and Pseudo-Classes for Remote Sensing Scene Representation |
Authors | Zhiqiang Gong, Ping Zhong, Weidong Hu, Fang Liu, Bingwei Hui |
Abstract | This work develops a novel end-to-end deep unsupervised learning method based on convolutional neural network (CNN) with pseudo-classes for remote sensing scene representation. First, we introduce center points as the centers of the pseudo classes and the training samples can be allocated with pseudo labels based on the center points. Therefore, the CNN model, which is used to extract features from the scenes, can be trained supervised with the pseudo labels. Moreover, a pseudo-center loss is developed to decrease the variance between the samples and the corresponding pseudo center point. The pseudo-center loss is important since it can update both the center points with the training samples and the CNN model with the center points in the training process simultaneously. Finally, joint learning of the pseudo-center loss and the pseudo softmax loss which is formulated with the samples and the pseudo labels is developed for unsupervised remote sensing scene representation to obtain discriminative representations from the scenes. Experiments are conducted over two commonly used remote sensing scene datasets to validate the effectiveness of the proposed method and the experimental results show the superiority of the proposed method when compared with other state-of-the-art methods. |
Tasks | |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07224v1 |
http://arxiv.org/pdf/1903.07224v1.pdf | |
PWC | https://paperswithcode.com/paper/an-end-to-end-joint-unsupervised-learning-of |
Repo | |
Framework | |
Learning Programmatic Idioms for Scalable Semantic Parsing
Title | Learning Programmatic Idioms for Scalable Semantic Parsing |
Authors | Srinivasan Iyer, Alvin Cheung, Luke Zettlemoyer |
Abstract | Programmers typically organize executable source code using high-level coding patterns or idiomatic structures such as nested loops, exception handlers and recursive blocks, rather than as individual code tokens. In contrast, state of the art (SOTA) semantic parsers still map natural language instructions to source code by building the code syntax tree one node at a time. In this paper, we introduce an iterative method to extract code idioms from large source code corpora by repeatedly collapsing most-frequent depth-2 subtrees of their syntax trees, and train semantic parsers to apply these idioms during decoding. Applying idiom-based decoding on a recent context-dependent semantic parsing task improves the SOTA by 2.2% BLEU score while reducing training time by more than 50%. This improved speed enables us to scale up the model by training on an extended training set that is 5$\times$ larger, to further move up the SOTA by an additional 2.3% BLEU and 0.9% exact match. Finally, idioms also significantly improve accuracy of semantic parsing to SQL on the ATIS-SQL dataset, when training data is limited. |
Tasks | Code Generation, Semantic Parsing |
Published | 2019-04-19 |
URL | https://arxiv.org/abs/1904.09086v2 |
https://arxiv.org/pdf/1904.09086v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-programmatic-idioms-for-scalable |
Repo | |
Framework | |
S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes
Title | S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes |
Authors | Yuzhe Qin, Rui Chen, Hao Zhu, Meng Song, Jing Xu, Hao Su |
Abstract | Grasping is among the most fundamental and long-lasting problems in robotics study. This paper studies the problem of 6-DoF(degree of freedom) grasping by a parallel gripper in a cluttered scene captured using a commodity depth sensor from a single viewpoint. We address the problem in a learning-based framework. At the high level, we rely on a single-shot grasp proposal network, trained with synthetic data and tested in real-world scenarios. Our single-shot neural network architecture can predict amodal grasp proposal efficiently and effectively. Our training data synthesis pipeline can generate scenes of complex object configuration and leverage an innovative gripper contact model to create dense and high-quality grasp annotations. Experiments in synthetic and real environments have demonstrated that the proposed approach can outperform state-of-the-arts by a large margin. |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14218v1 |
https://arxiv.org/pdf/1910.14218v1.pdf | |
PWC | https://paperswithcode.com/paper/s4g-amodal-single-view-single-shot-se3-grasp |
Repo | |
Framework | |
Super Resolution Convolutional Neural Network Models for Enhancing Resolution of Rock Micro-CT Images
Title | Super Resolution Convolutional Neural Network Models for Enhancing Resolution of Rock Micro-CT Images |
Authors | Ying Da Wang, Ryan Armstrong, Peyman Mostaghimi |
Abstract | Single Image Super Resolution (SISR) techniques based on Super Resolution Convolutional Neural Networks (SRCNN) are applied to micro-computed tomography ({\mu}CT) images of sandstone and carbonate rocks. Digital rock imaging is limited by the capability of the scanning device resulting in trade-offs between resolution and field of view, and super resolution methods tested in this study aim to compensate for these limits. SRCNN models SR-Resnet, Enhanced Deep SR (EDSR), and Wide-Activation Deep SR (WDSR) are used on the Digital Rock Super Resolution 1 (DRSRD1) Dataset of 4x downsampled images, comprising of 2000 high resolution (800x800) raw micro-CT images of Bentheimer sandstone and Estaillades carbonate. The trained models are applied to the validation and test data within the dataset and show a 3-5 dB rise in image quality compared to bicubic interpolation, with all tested models performing within a 0.1 dB range. Difference maps indicate that edge sharpness is completely recovered in images within the scope of the trained model, with only high frequency noise related detail loss. We find that aside from generation of high-resolution images, a beneficial side effect of super resolution methods applied to synthetically downgraded images is the removal of image noise while recovering edgewise sharpness which is beneficial for the segmentation process. The model is also tested against real low-resolution images of Bentheimer rock with image augmentation to account for natural noise and blur. The SRCNN method is shown to act as a preconditioner for image segmentation under these circumstances which naturally leads to further future development and training of models that segment an image directly. Image restoration by SRCNN on the rock images is of significantly higher quality than traditional methods and suggests SRCNN methods are a viable processing step in a digital rock workflow. |
Tasks | Image Augmentation, Image Restoration, Image Super-Resolution, Semantic Segmentation, Super-Resolution |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07470v1 |
http://arxiv.org/pdf/1904.07470v1.pdf | |
PWC | https://paperswithcode.com/paper/super-resolution-convolutional-neural-network |
Repo | |
Framework | |
Enabling Machine Learning Across Heterogeneous Sensor Networks with Graph Autoencoders
Title | Enabling Machine Learning Across Heterogeneous Sensor Networks with Graph Autoencoders |
Authors | Johan Medrano, Fuchun Joseph Lin |
Abstract | Machine Learning (ML) has been applied to enable many life-assisting appli-cations, such as abnormality detection and emdergency request for the soli-tary elderly. However, in most cases machine learning algorithms depend on the layout of the target Internet of Things (IoT) sensor network. Hence, to deploy an application across Heterogeneous Sensor Networks (HSNs), i.e. sensor networks with different sensors type or layouts, it is required to repeat the process of data collection and ML algorithm training. In this paper, we introduce a novel framework leveraging deep learning for graphs to enable using the same activity recognition system across HSNs deployed in differ-ent smart homes. Using our framework, we were able to transfer activity classifiers trained with activity labels on a source HSN to a target HSN, reaching about 75% of the baseline accuracy on the target HSN without us-ing target activity labels. Moreover, our model can quickly adapt to unseen sensor layouts, which makes it highly suitable for the gradual deployment of real-world ML-based applications. In addition, we show that our framework is resilient to suboptimal graph representations of HSNs. |
Tasks | Activity Recognition, Anomaly Detection |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.05879v1 |
https://arxiv.org/pdf/1912.05879v1.pdf | |
PWC | https://paperswithcode.com/paper/enabling-machine-learning-across |
Repo | |
Framework | |