January 28, 2020

3123 words 15 mins read

Paper Group ANR 795

Paper Group ANR 795

Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG. Perturbation Validation: A New Heuristic to Validate Machine Learning Models. CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images. PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds. The Effect of Visual Design …

Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG

Title Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG
Authors Hiroyuki Kobayashi, Osamu Watanabe, Hitoshi Kiya
Abstract We propose an efficient two-layer near-lossless coding method using an extended histogram packing technique with backward compatibility to the legacy JPEG standard. The JPEG XT, which is the international standard to compress HDR images, adopts a two-layer coding method for backward compatibility to the legacy JPEG standard. However, there are two problems with this two-layer coding method. One is that it does not exhibit better near-lossless performance than other methods for HDR image compression with single-layer structure. The other problem is that the determining the appropriate values of the coding parameters may be required for each input image to achieve good compression performance of near-lossless compression with the two-layer coding method of the JPEG XT. To solve these problems, we focus on a histogram-packing technique that takes into account the histogram sparseness of HDR images. We used zero-skip quantization, which is an extension of the histogram-packing technique proposed for lossless coding, for implementing the proposed near-lossless coding method. The experimental results indicate that the proposed method exhibits not only a better near-lossless compression performance than that of the two-layer coding method of the JPEG XT, but also there are no issue regarding the combination of parameter values without losing backward compatibility to the JPEG standard.
Tasks Image Compression, Quantization
Published 2019-05-09
URL https://arxiv.org/abs/1905.04129v1
PDF https://arxiv.org/pdf/1905.04129v1.pdf
PWC https://paperswithcode.com/paper/two-layer-near-lossless-hdr-coding-with
Repo
Framework

Perturbation Validation: A New Heuristic to Validate Machine Learning Models

Title Perturbation Validation: A New Heuristic to Validate Machine Learning Models
Authors Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor
Abstract This paper introduces Perturbation Validation (PV), a new heuristic to validate machine learning models. PV does not rely on test data. Instead, it perturbs training data labels, re-trains the model against the perturbed data, then uses the consequent training accuracy decrease rate to assess model fit. PV also differs from traditional statistical approaches, which make judgements without considering label distribution. We evaluate PV on 10 real-world datasets and 6 synthetic datasets. Our results demonstrate that PV is more discriminating about model fit than existing validation approaches and it accords well with widely-held intuitions concerning the properties of a good model fit measurement. We also show that PV complements existing validation approaches, allowing us to give explanations for some of the issues present in the recently-debated “apparent paradox” that high capacity (potentially “overfitted”) models may, nevertheless, exhibit good generalisation ability.
Tasks
Published 2019-05-24
URL https://arxiv.org/abs/1905.10201v3
PDF https://arxiv.org/pdf/1905.10201v3.pdf
PWC https://paperswithcode.com/paper/perturbed-model-validation-a-new-framework-to
Repo
Framework

CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images

Title CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images
Authors Arpita Dutta, Samit Biswas
Abstract Peoples nowadays prefer to use digital gadgets like cameras or mobile phones for capturing documents. Automatic extraction of panels/characters from the images of a comic document is challenging due to the wide variety of drawing styles adopted by writers, beneficial for readers to read them on mobile devices at any time and useful for automatic digitization. Most of the methods for localization of panel/character rely on the connected component analysis or page background mask and are applicable only for a limited comic dataset. This work proposes a panel/character localization architecture based on the features of YOLO and CNN for extraction of both panels and characters from comic book images. The method achieved remarkable results on Bengali Comic Book Image dataset (BCBId) consisting of total $4130$ images, developed by us as well as on a variety of publicly available comic datasets in other languages, i.e. eBDtheque, Manga 109 and DCM dataset.
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09233v1
PDF https://arxiv.org/pdf/1910.09233v1.pdf
PWC https://paperswithcode.com/paper/cnn-based-extraction-of-panelscharacters-from
Repo
Framework

PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds

Title PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds
Authors Jonathan Li, Rongren Wu, Yiping Chen, Qing Zhu, Zhipeng Luo, Cheng Wang
Abstract Middle-echo, which covers one or a few corresponding points, is a specific type of 3D point cloud acquired by a multi-echo laser scanner. In this paper, we propose a novel approach for automatic segmentation of trees that leverages middle-echo information from LiDAR point clouds. First, using a convolution classification method, the proposed type of point clouds reflected by the middle echoes are identified from all point clouds. The middle-echo point clouds are distinguished from the first and last echoes. Hence, the crown positions of the trees are quickly detected from the huge number of point clouds. Second, to accurately extract trees from all point clouds, we propose a 3D deep learning network, PointNLM, to semantically segment tree crowns. PointNLM captures the long-range relationship between the point clouds via a non-local branch and extracts high-level features via max-pooling applied to unordered points. The whole framework is evaluated using the Semantic 3D reduced-test set. The IoU of tree point cloud segmentation reached 0.864. In addition, the semantic segmentation network was tested using the Paris-Lille-3D dataset. The average IoU outperformed several other popular methods. The experimental results indicate that the proposed algorithm provides an excellent solution for vegetation segmentation from LiDAR point clouds.
Tasks Semantic Segmentation
Published 2019-06-20
URL https://arxiv.org/abs/1906.08476v2
PDF https://arxiv.org/pdf/1906.08476v2.pdf
PWC https://paperswithcode.com/paper/pointnlm-point-nonlocal-means-for-vegetation
Repo
Framework

The Effect of Visual Design in Image Classification

Title The Effect of Visual Design in Image Classification
Authors Naftali Cohen, Tucker Balch, Manuela Veloso
Abstract Financial companies continuously analyze the state of the markets to rethink and adjust their investment strategies. While the analysis is done on the digital form of data, decisions are often made based on graphical representations in white papers or presentation slides. In this study, we examine whether binary decisions are better to be decided based on the numeric or the visual representation of the same data. Using two data sets, a matrix of numerical data with spatial dependencies and financial data describing the state of the S&P index, we compare the results of supervised classification based on the original numerical representation and the visual transformation of the same data. We show that, for these data sets, the visual transformation results in higher predictability skill compared to the original form of the data. We suggest thinking of the visual representation of numeric data, effectively, as a combination of dimensional reduction and feature engineering techniques. In particular, if the visual layout encapsulates the full complexity of the data. In this view, thoughtful visual design can guard against overfitting, or introduce new features – all of which benefit the learning process, and effectively lead to better recognition of meaningful patterns.
Tasks Feature Engineering, Image Classification
Published 2019-07-22
URL https://arxiv.org/abs/1907.09567v2
PDF https://arxiv.org/pdf/1907.09567v2.pdf
PWC https://paperswithcode.com/paper/the-effect-of-visual-design-in-image
Repo
Framework

Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection

Title Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection
Authors Maxim Pisov, Mikhail Goncharov, Nadezhda Kurochkina, Sergey Morozov, Victor Gombolevskiy, Valeria Chernina, Anton Vladzymyrskyy, Ksenia Zamyatina, Anna Chesnokova, Igor Pronin, Michael Shifrin, Mikhail Belyaev
Abstract Midline shift (MLS) is a well-established factor used for outcome prediction in traumatic brain injury, stroke and brain tumors. The importance of automatic estimation of MLS was recently highlighted by ACR Data Science Institute. In this paper we introduce a novel deep learning based approach for the problem of MLS detection, which exploits task-specific structural knowledge. We evaluate our method on a large dataset containing heterogeneous images with significant MLS and show that its mean error approaches the inter-expert variability. Finally, we show the robustness of our approach by validating it on an external dataset, acquired during routine clinical practice.
Tasks
Published 2019-08-13
URL https://arxiv.org/abs/1908.04568v3
PDF https://arxiv.org/pdf/1908.04568v3.pdf
PWC https://paperswithcode.com/paper/incorporating-task-specific-structural
Repo
Framework

Successor Options: An Option Discovery Framework for Reinforcement Learning

Title Successor Options: An Option Discovery Framework for Reinforcement Learning
Authors Rahul Ramesh, Manan Tomar, Balaraman Ravindran
Abstract The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces easily. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.
Tasks
Published 2019-05-14
URL https://arxiv.org/abs/1905.05731v1
PDF https://arxiv.org/pdf/1905.05731v1.pdf
PWC https://paperswithcode.com/paper/successor-options-an-option-discovery
Repo
Framework

EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition

Title EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition
Authors Monu Verma, Jaspreet Kaur Bhui, Santosh Vipparthi, Girdhari Singh
Abstract Facial expressions have essential cues to infer the humans state of mind, that conveys adequate information to understand individuals actual feelings. Thus, automatic facial expression recognition is an interesting and crucial task to interpret the humans cognitive state through the machine. In this paper, we proposed an Exigent Features Preservative Network (EXPERTNet), to describe the features of the facial expressions. The EXPERTNet extracts only pertinent features and neglect others by using exigent feature (ExFeat) block, mainly comprises of elective layer. Specifically, elective layer selects the desired edge variation features from the previous layer outcomes, which are generated by applying different sized filters as 1 x 1, 3 x 3, 5 x 5 and 7 x 7. Different sized filters aid to elicits both micro and high-level features that enhance the learnability of neurons. ExFeat block preserves the spatial structural information of the facial expression, which allows to discriminate between different classes of facial expressions. Visual representation of the proposed method over different facial expressions shows the learning capability of the neurons of different layers. Experimental and comparative analysis results over four comprehensive datasets CK+, MMI DISFA and GEMEP-FERA, ensures the better performance of the proposed network as compared to existing networks.
Tasks Facial Expression Recognition
Published 2019-04-14
URL http://arxiv.org/abs/1904.06658v1
PDF http://arxiv.org/pdf/1904.06658v1.pdf
PWC https://paperswithcode.com/paper/expertnet-exigent-features-preservative
Repo
Framework

Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data

Title Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data
Authors Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uǧurbil, Mehmet Akçakaya
Abstract Deep learning (DL) has emerged as a tool for improving accelerated MRI reconstruction. A common strategy among DL methods is the physics-based approach, where a regularized iterative algorithm alternating between data consistency and a regularizer is unrolled for a finite number of iterations. This unrolled network is then trained end-to-end in a supervised manner, using fully-sampled data as ground truth for the network output. However, in a number of scenarios, it is difficult to obtain fully-sampled datasets, due to physiological constraints such as organ motion or physical constraints such as signal decay. In this work, we tackle this issue and propose a self-supervised learning strategy that enables physics-based DL reconstruction without fully-sampled data. Our approach is to divide the acquired sub-sampled points for each scan into training and validation subsets. During training, data consistency is enforced over the training subset, while the validation subset is used to define the loss function. Results show that the proposed self-supervised learning method successfully reconstructs images without fully-sampled data, performing similarly to the supervised approach that is trained with fully-sampled references. This has implications for physics-based inverse problem approaches for other settings, where fully-sampled data is not available or possible to acquire.
Tasks
Published 2019-10-21
URL https://arxiv.org/abs/1910.09116v1
PDF https://arxiv.org/pdf/1910.09116v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-physics-based-deep-learning
Repo
Framework

Neural Machine Translation with 4-Bit Precision and Beyond

Title Neural Machine Translation with 4-Bit Precision and Beyond
Authors Alham Fikri Aji, Kenneth Heafield
Abstract Neural Machine Translation (NMT) is resource intensive. We design a quantization procedure to compress NMT models better for devices with limited hardware capability. Because most neural network parameters are near zero, we employ logarithmic quantization in lieu of fixed-point quantization. However, we find bias terms are less amenable to log quantization but note they comprise a tiny fraction of the model, so we leave them uncompressed. We also propose to use an error-feedback mechanism during retraining, to preserve the compressed model as a stale gradient. We empirically show that NMT models based on Transformer or RNN architecture can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture seems to be more robust to quantization, compared to the Transformer.
Tasks Machine Translation, Quantization
Published 2019-09-13
URL https://arxiv.org/abs/1909.06091v2
PDF https://arxiv.org/pdf/1909.06091v2.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-with-4-bit
Repo
Framework

Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations

Title Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations
Authors Shin’ya Yamaguchi, Sekitoshi Kanai, Tetsuya Shioda, Shoichiro Takeda
Abstract Self-supervised learning is one of the most promising approaches to learn representations capturing semantic features in images without any manual annotation cost. To learn useful representations, a self-supervised model solves a pretext-task, which is defined by data itself. Among a number of pretext-tasks, the rotation prediction task (Rotation) achieves better representations for solving various target tasks despite its simplicity of the implementation. However, we found that Rotation can fail to capture semantic features related to image textures and colors. To tackle this problem, we introduce a learning technique called multiple pretext-task for self-supervised learning (MP-SSL), which solves multiple pretext-task in addition to Rotation simultaneously. In order to capture features of textures and colors, we employ the transformations of image enhancements (e.g., sharpening and solarizing) as the additional pretext-tasks. MP-SSL efficiently trains a model by leveraging a Frank-Wolfe based multi-task training algorithm. Our experimental results show MP-SSL models outperform Rotation on multiple standard benchmarks and achieve state-of-the-art performance on Places-205.
Tasks
Published 2019-12-25
URL https://arxiv.org/abs/1912.11603v1
PDF https://arxiv.org/pdf/1912.11603v1.pdf
PWC https://paperswithcode.com/paper/multiple-pretext-task-for-self-supervised
Repo
Framework

TextCohesion: Detecting Text for Arbitrary Shapes

Title TextCohesion: Detecting Text for Arbitrary Shapes
Authors Weijia Wu, Jici Xing, Hong Zhou
Abstract In this paper, we propose a pixel-wise method named TextCohesion for scene text detection, which splits a text instance into five key components: a Text Skeleton and four Directional Pixel Regions. These components are easier to handle than the entire text instance. A confidence scoring mechanism is designed to filter characters that are similar to text. Our method can integrate text contexts intensively when backgrounds are complex. Experiments on two curved challenging benchmarks demonstrate that TextCohesion outperforms state-of-the-art methods, achieving the F-measure of 84.6% on Total-Text and bfseries86.3% on SCUT-CTW1500.
Tasks Curved Text Detection, Scene Text Detection
Published 2019-04-22
URL http://arxiv.org/abs/1904.12640v2
PDF http://arxiv.org/pdf/1904.12640v2.pdf
PWC https://paperswithcode.com/paper/190412640
Repo
Framework

Deep Gradient Boosting – Layer-wise Input Normalization of Neural Networks

Title Deep Gradient Boosting – Layer-wise Input Normalization of Neural Networks
Authors Erhan Bilal
Abstract Stochastic gradient descent (SGD) has been the dominant optimization method for training deep neural networks due to its many desirable properties. One of the more remarkable and least understood quality of SGD is that it generalizes relatively well on unseen data even when the neural network has millions of parameters. We hypothesize that in certain cases it is desirable to relax its intrinsic generalization properties and introduce an extension of SGD called deep gradient boosting (DGB). The key idea of DGB is that back-propagated gradients inferred using the chain rule can be viewed as pseudo-residual targets of a gradient boosting problem. Thus at each layer of a neural network the weight update is calculated by solving the corresponding boosting problem using a linear base learner. The resulting weight update formula can also be viewed as a normalization procedure of the data that arrives at each layer during the forward pass. When implemented as a separate input normalization layer (INN) the new architecture shows improved performance on image recognition tasks when compared to the same architecture without normalization layers. As opposed to batch normalization (BN), INN has no learnable parameters however it matches its performance on CIFAR10 and ImageNet classification tasks.
Tasks
Published 2019-07-29
URL https://arxiv.org/abs/1907.12608v2
PDF https://arxiv.org/pdf/1907.12608v2.pdf
PWC https://paperswithcode.com/paper/deep-gradient-boosting
Repo
Framework

An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks

Title An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks
Authors Giyoung Jeon, Haedong Jeong, Jaesik Choi
Abstract Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. In particular, the adversarial training scheme has been applied to many DGNNs and has exhibited powerful performance. Despite of recent advances in generative networks, identifying the image generation mechanism still remains challenging. In this paper, we present an explorative sampling algorithm to analyze generation mechanism of DGNNs. Our method efficiently obtains samples with identical attributes from a query image in a perspective of the trained model. We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. To handle a large number of boundaries, we obtain the essential set of boundaries using optimization. By gathering samples within the region surrounded by generative boundaries, we can empirically reveal the characteristics of the internal layers of DGNNs. We also demonstrate that our algorithm can find more homogeneous, the model specific samples compared to the variations of {\epsilon}-based sampling method.
Tasks Image Generation
Published 2019-12-12
URL https://arxiv.org/abs/1912.05827v1
PDF https://arxiv.org/pdf/1912.05827v1.pdf
PWC https://paperswithcode.com/paper/an-efficient-explorative-sampling-considering
Repo
Framework

Differentiable Mask Pruning for Neural Networks

Title Differentiable Mask Pruning for Neural Networks
Authors Ramchalam Kinattinkara Ramakrishnan, Eyyüb Sari, Vahid Partovi Nia
Abstract Pruning of neural networks is one of the well-known and promising model simplification techniques. Most neural network models are large and require expensive computations to predict new instances. It is imperative to compress the network to deploy models on low resource devices. Most compression techniques, especially pruning have been focusing on computer vision and convolution neural networks. Existing techniques are complex and require multi-stage optimization and fine-tuning to recover the state-of-the-art accuracy. We introduce a \emph{Differentiable Mask Pruning} (DMP), that simplifies the network while training, and can be used to induce sparsity on weight, filter, node or sub-network. Our method achieves competitive results on standard vision and NLP benchmarks, and is easy to integrate within the deep learning toolbox. DMP bridges the gap between neural model compression and differentiable neural architecture search.
Tasks Model Compression, Neural Architecture Search
Published 2019-09-10
URL https://arxiv.org/abs/1909.04567v1
PDF https://arxiv.org/pdf/1909.04567v1.pdf
PWC https://paperswithcode.com/paper/differentiable-mask-pruning-for-neural
Repo
Framework
comments powered by Disqus