Paper Group ANR 795
Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG. Perturbation Validation: A New Heuristic to Validate Machine Learning Models. CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images. PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds. The Effect of Visual Design …
Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG
Title | Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG |
Authors | Hiroyuki Kobayashi, Osamu Watanabe, Hitoshi Kiya |
Abstract | We propose an efficient two-layer near-lossless coding method using an extended histogram packing technique with backward compatibility to the legacy JPEG standard. The JPEG XT, which is the international standard to compress HDR images, adopts a two-layer coding method for backward compatibility to the legacy JPEG standard. However, there are two problems with this two-layer coding method. One is that it does not exhibit better near-lossless performance than other methods for HDR image compression with single-layer structure. The other problem is that the determining the appropriate values of the coding parameters may be required for each input image to achieve good compression performance of near-lossless compression with the two-layer coding method of the JPEG XT. To solve these problems, we focus on a histogram-packing technique that takes into account the histogram sparseness of HDR images. We used zero-skip quantization, which is an extension of the histogram-packing technique proposed for lossless coding, for implementing the proposed near-lossless coding method. The experimental results indicate that the proposed method exhibits not only a better near-lossless compression performance than that of the two-layer coding method of the JPEG XT, but also there are no issue regarding the combination of parameter values without losing backward compatibility to the JPEG standard. |
Tasks | Image Compression, Quantization |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.04129v1 |
https://arxiv.org/pdf/1905.04129v1.pdf | |
PWC | https://paperswithcode.com/paper/two-layer-near-lossless-hdr-coding-with |
Repo | |
Framework | |
Perturbation Validation: A New Heuristic to Validate Machine Learning Models
Title | Perturbation Validation: A New Heuristic to Validate Machine Learning Models |
Authors | Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor |
Abstract | This paper introduces Perturbation Validation (PV), a new heuristic to validate machine learning models. PV does not rely on test data. Instead, it perturbs training data labels, re-trains the model against the perturbed data, then uses the consequent training accuracy decrease rate to assess model fit. PV also differs from traditional statistical approaches, which make judgements without considering label distribution. We evaluate PV on 10 real-world datasets and 6 synthetic datasets. Our results demonstrate that PV is more discriminating about model fit than existing validation approaches and it accords well with widely-held intuitions concerning the properties of a good model fit measurement. We also show that PV complements existing validation approaches, allowing us to give explanations for some of the issues present in the recently-debated “apparent paradox” that high capacity (potentially “overfitted”) models may, nevertheless, exhibit good generalisation ability. |
Tasks | |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10201v3 |
https://arxiv.org/pdf/1905.10201v3.pdf | |
PWC | https://paperswithcode.com/paper/perturbed-model-validation-a-new-framework-to |
Repo | |
Framework | |
CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images
Title | CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images |
Authors | Arpita Dutta, Samit Biswas |
Abstract | Peoples nowadays prefer to use digital gadgets like cameras or mobile phones for capturing documents. Automatic extraction of panels/characters from the images of a comic document is challenging due to the wide variety of drawing styles adopted by writers, beneficial for readers to read them on mobile devices at any time and useful for automatic digitization. Most of the methods for localization of panel/character rely on the connected component analysis or page background mask and are applicable only for a limited comic dataset. This work proposes a panel/character localization architecture based on the features of YOLO and CNN for extraction of both panels and characters from comic book images. The method achieved remarkable results on Bengali Comic Book Image dataset (BCBId) consisting of total $4130$ images, developed by us as well as on a variety of publicly available comic datasets in other languages, i.e. eBDtheque, Manga 109 and DCM dataset. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09233v1 |
https://arxiv.org/pdf/1910.09233v1.pdf | |
PWC | https://paperswithcode.com/paper/cnn-based-extraction-of-panelscharacters-from |
Repo | |
Framework | |
PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds
Title | PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds |
Authors | Jonathan Li, Rongren Wu, Yiping Chen, Qing Zhu, Zhipeng Luo, Cheng Wang |
Abstract | Middle-echo, which covers one or a few corresponding points, is a specific type of 3D point cloud acquired by a multi-echo laser scanner. In this paper, we propose a novel approach for automatic segmentation of trees that leverages middle-echo information from LiDAR point clouds. First, using a convolution classification method, the proposed type of point clouds reflected by the middle echoes are identified from all point clouds. The middle-echo point clouds are distinguished from the first and last echoes. Hence, the crown positions of the trees are quickly detected from the huge number of point clouds. Second, to accurately extract trees from all point clouds, we propose a 3D deep learning network, PointNLM, to semantically segment tree crowns. PointNLM captures the long-range relationship between the point clouds via a non-local branch and extracts high-level features via max-pooling applied to unordered points. The whole framework is evaluated using the Semantic 3D reduced-test set. The IoU of tree point cloud segmentation reached 0.864. In addition, the semantic segmentation network was tested using the Paris-Lille-3D dataset. The average IoU outperformed several other popular methods. The experimental results indicate that the proposed algorithm provides an excellent solution for vegetation segmentation from LiDAR point clouds. |
Tasks | Semantic Segmentation |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08476v2 |
https://arxiv.org/pdf/1906.08476v2.pdf | |
PWC | https://paperswithcode.com/paper/pointnlm-point-nonlocal-means-for-vegetation |
Repo | |
Framework | |
The Effect of Visual Design in Image Classification
Title | The Effect of Visual Design in Image Classification |
Authors | Naftali Cohen, Tucker Balch, Manuela Veloso |
Abstract | Financial companies continuously analyze the state of the markets to rethink and adjust their investment strategies. While the analysis is done on the digital form of data, decisions are often made based on graphical representations in white papers or presentation slides. In this study, we examine whether binary decisions are better to be decided based on the numeric or the visual representation of the same data. Using two data sets, a matrix of numerical data with spatial dependencies and financial data describing the state of the S&P index, we compare the results of supervised classification based on the original numerical representation and the visual transformation of the same data. We show that, for these data sets, the visual transformation results in higher predictability skill compared to the original form of the data. We suggest thinking of the visual representation of numeric data, effectively, as a combination of dimensional reduction and feature engineering techniques. In particular, if the visual layout encapsulates the full complexity of the data. In this view, thoughtful visual design can guard against overfitting, or introduce new features – all of which benefit the learning process, and effectively lead to better recognition of meaningful patterns. |
Tasks | Feature Engineering, Image Classification |
Published | 2019-07-22 |
URL | https://arxiv.org/abs/1907.09567v2 |
https://arxiv.org/pdf/1907.09567v2.pdf | |
PWC | https://paperswithcode.com/paper/the-effect-of-visual-design-in-image |
Repo | |
Framework | |
Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection
Title | Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection |
Authors | Maxim Pisov, Mikhail Goncharov, Nadezhda Kurochkina, Sergey Morozov, Victor Gombolevskiy, Valeria Chernina, Anton Vladzymyrskyy, Ksenia Zamyatina, Anna Chesnokova, Igor Pronin, Michael Shifrin, Mikhail Belyaev |
Abstract | Midline shift (MLS) is a well-established factor used for outcome prediction in traumatic brain injury, stroke and brain tumors. The importance of automatic estimation of MLS was recently highlighted by ACR Data Science Institute. In this paper we introduce a novel deep learning based approach for the problem of MLS detection, which exploits task-specific structural knowledge. We evaluate our method on a large dataset containing heterogeneous images with significant MLS and show that its mean error approaches the inter-expert variability. Finally, we show the robustness of our approach by validating it on an external dataset, acquired during routine clinical practice. |
Tasks | |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04568v3 |
https://arxiv.org/pdf/1908.04568v3.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-task-specific-structural |
Repo | |
Framework | |
Successor Options: An Option Discovery Framework for Reinforcement Learning
Title | Successor Options: An Option Discovery Framework for Reinforcement Learning |
Authors | Rahul Ramesh, Manan Tomar, Balaraman Ravindran |
Abstract | The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces easily. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch. |
Tasks | |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05731v1 |
https://arxiv.org/pdf/1905.05731v1.pdf | |
PWC | https://paperswithcode.com/paper/successor-options-an-option-discovery |
Repo | |
Framework | |
EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition
Title | EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition |
Authors | Monu Verma, Jaspreet Kaur Bhui, Santosh Vipparthi, Girdhari Singh |
Abstract | Facial expressions have essential cues to infer the humans state of mind, that conveys adequate information to understand individuals actual feelings. Thus, automatic facial expression recognition is an interesting and crucial task to interpret the humans cognitive state through the machine. In this paper, we proposed an Exigent Features Preservative Network (EXPERTNet), to describe the features of the facial expressions. The EXPERTNet extracts only pertinent features and neglect others by using exigent feature (ExFeat) block, mainly comprises of elective layer. Specifically, elective layer selects the desired edge variation features from the previous layer outcomes, which are generated by applying different sized filters as 1 x 1, 3 x 3, 5 x 5 and 7 x 7. Different sized filters aid to elicits both micro and high-level features that enhance the learnability of neurons. ExFeat block preserves the spatial structural information of the facial expression, which allows to discriminate between different classes of facial expressions. Visual representation of the proposed method over different facial expressions shows the learning capability of the neurons of different layers. Experimental and comparative analysis results over four comprehensive datasets CK+, MMI DISFA and GEMEP-FERA, ensures the better performance of the proposed network as compared to existing networks. |
Tasks | Facial Expression Recognition |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06658v1 |
http://arxiv.org/pdf/1904.06658v1.pdf | |
PWC | https://paperswithcode.com/paper/expertnet-exigent-features-preservative |
Repo | |
Framework | |
Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data
Title | Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data |
Authors | Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uǧurbil, Mehmet Akçakaya |
Abstract | Deep learning (DL) has emerged as a tool for improving accelerated MRI reconstruction. A common strategy among DL methods is the physics-based approach, where a regularized iterative algorithm alternating between data consistency and a regularizer is unrolled for a finite number of iterations. This unrolled network is then trained end-to-end in a supervised manner, using fully-sampled data as ground truth for the network output. However, in a number of scenarios, it is difficult to obtain fully-sampled datasets, due to physiological constraints such as organ motion or physical constraints such as signal decay. In this work, we tackle this issue and propose a self-supervised learning strategy that enables physics-based DL reconstruction without fully-sampled data. Our approach is to divide the acquired sub-sampled points for each scan into training and validation subsets. During training, data consistency is enforced over the training subset, while the validation subset is used to define the loss function. Results show that the proposed self-supervised learning method successfully reconstructs images without fully-sampled data, performing similarly to the supervised approach that is trained with fully-sampled references. This has implications for physics-based inverse problem approaches for other settings, where fully-sampled data is not available or possible to acquire. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09116v1 |
https://arxiv.org/pdf/1910.09116v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-physics-based-deep-learning |
Repo | |
Framework | |
Neural Machine Translation with 4-Bit Precision and Beyond
Title | Neural Machine Translation with 4-Bit Precision and Beyond |
Authors | Alham Fikri Aji, Kenneth Heafield |
Abstract | Neural Machine Translation (NMT) is resource intensive. We design a quantization procedure to compress NMT models better for devices with limited hardware capability. Because most neural network parameters are near zero, we employ logarithmic quantization in lieu of fixed-point quantization. However, we find bias terms are less amenable to log quantization but note they comprise a tiny fraction of the model, so we leave them uncompressed. We also propose to use an error-feedback mechanism during retraining, to preserve the compressed model as a stale gradient. We empirically show that NMT models based on Transformer or RNN architecture can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture seems to be more robust to quantization, compared to the Transformer. |
Tasks | Machine Translation, Quantization |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06091v2 |
https://arxiv.org/pdf/1909.06091v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-with-4-bit |
Repo | |
Framework | |
Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations
Title | Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations |
Authors | Shin’ya Yamaguchi, Sekitoshi Kanai, Tetsuya Shioda, Shoichiro Takeda |
Abstract | Self-supervised learning is one of the most promising approaches to learn representations capturing semantic features in images without any manual annotation cost. To learn useful representations, a self-supervised model solves a pretext-task, which is defined by data itself. Among a number of pretext-tasks, the rotation prediction task (Rotation) achieves better representations for solving various target tasks despite its simplicity of the implementation. However, we found that Rotation can fail to capture semantic features related to image textures and colors. To tackle this problem, we introduce a learning technique called multiple pretext-task for self-supervised learning (MP-SSL), which solves multiple pretext-task in addition to Rotation simultaneously. In order to capture features of textures and colors, we employ the transformations of image enhancements (e.g., sharpening and solarizing) as the additional pretext-tasks. MP-SSL efficiently trains a model by leveraging a Frank-Wolfe based multi-task training algorithm. Our experimental results show MP-SSL models outperform Rotation on multiple standard benchmarks and achieve state-of-the-art performance on Places-205. |
Tasks | |
Published | 2019-12-25 |
URL | https://arxiv.org/abs/1912.11603v1 |
https://arxiv.org/pdf/1912.11603v1.pdf | |
PWC | https://paperswithcode.com/paper/multiple-pretext-task-for-self-supervised |
Repo | |
Framework | |
TextCohesion: Detecting Text for Arbitrary Shapes
Title | TextCohesion: Detecting Text for Arbitrary Shapes |
Authors | Weijia Wu, Jici Xing, Hong Zhou |
Abstract | In this paper, we propose a pixel-wise method named TextCohesion for scene text detection, which splits a text instance into five key components: a Text Skeleton and four Directional Pixel Regions. These components are easier to handle than the entire text instance. A confidence scoring mechanism is designed to filter characters that are similar to text. Our method can integrate text contexts intensively when backgrounds are complex. Experiments on two curved challenging benchmarks demonstrate that TextCohesion outperforms state-of-the-art methods, achieving the F-measure of 84.6% on Total-Text and bfseries86.3% on SCUT-CTW1500. |
Tasks | Curved Text Detection, Scene Text Detection |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.12640v2 |
http://arxiv.org/pdf/1904.12640v2.pdf | |
PWC | https://paperswithcode.com/paper/190412640 |
Repo | |
Framework | |
Deep Gradient Boosting – Layer-wise Input Normalization of Neural Networks
Title | Deep Gradient Boosting – Layer-wise Input Normalization of Neural Networks |
Authors | Erhan Bilal |
Abstract | Stochastic gradient descent (SGD) has been the dominant optimization method for training deep neural networks due to its many desirable properties. One of the more remarkable and least understood quality of SGD is that it generalizes relatively well on unseen data even when the neural network has millions of parameters. We hypothesize that in certain cases it is desirable to relax its intrinsic generalization properties and introduce an extension of SGD called deep gradient boosting (DGB). The key idea of DGB is that back-propagated gradients inferred using the chain rule can be viewed as pseudo-residual targets of a gradient boosting problem. Thus at each layer of a neural network the weight update is calculated by solving the corresponding boosting problem using a linear base learner. The resulting weight update formula can also be viewed as a normalization procedure of the data that arrives at each layer during the forward pass. When implemented as a separate input normalization layer (INN) the new architecture shows improved performance on image recognition tasks when compared to the same architecture without normalization layers. As opposed to batch normalization (BN), INN has no learnable parameters however it matches its performance on CIFAR10 and ImageNet classification tasks. |
Tasks | |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12608v2 |
https://arxiv.org/pdf/1907.12608v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-gradient-boosting |
Repo | |
Framework | |
An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks
Title | An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks |
Authors | Giyoung Jeon, Haedong Jeong, Jaesik Choi |
Abstract | Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. In particular, the adversarial training scheme has been applied to many DGNNs and has exhibited powerful performance. Despite of recent advances in generative networks, identifying the image generation mechanism still remains challenging. In this paper, we present an explorative sampling algorithm to analyze generation mechanism of DGNNs. Our method efficiently obtains samples with identical attributes from a query image in a perspective of the trained model. We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. To handle a large number of boundaries, we obtain the essential set of boundaries using optimization. By gathering samples within the region surrounded by generative boundaries, we can empirically reveal the characteristics of the internal layers of DGNNs. We also demonstrate that our algorithm can find more homogeneous, the model specific samples compared to the variations of {\epsilon}-based sampling method. |
Tasks | Image Generation |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.05827v1 |
https://arxiv.org/pdf/1912.05827v1.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-explorative-sampling-considering |
Repo | |
Framework | |
Differentiable Mask Pruning for Neural Networks
Title | Differentiable Mask Pruning for Neural Networks |
Authors | Ramchalam Kinattinkara Ramakrishnan, Eyyüb Sari, Vahid Partovi Nia |
Abstract | Pruning of neural networks is one of the well-known and promising model simplification techniques. Most neural network models are large and require expensive computations to predict new instances. It is imperative to compress the network to deploy models on low resource devices. Most compression techniques, especially pruning have been focusing on computer vision and convolution neural networks. Existing techniques are complex and require multi-stage optimization and fine-tuning to recover the state-of-the-art accuracy. We introduce a \emph{Differentiable Mask Pruning} (DMP), that simplifies the network while training, and can be used to induce sparsity on weight, filter, node or sub-network. Our method achieves competitive results on standard vision and NLP benchmarks, and is easy to integrate within the deep learning toolbox. DMP bridges the gap between neural model compression and differentiable neural architecture search. |
Tasks | Model Compression, Neural Architecture Search |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04567v1 |
https://arxiv.org/pdf/1909.04567v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-mask-pruning-for-neural |
Repo | |
Framework | |