January 28, 2020

3123 words 15 mins read

Paper Group ANR 795

Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG. Perturbation Validation: A New Heuristic to Validate Machine Learning Models. CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images. PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds. The Effect of Visual Design …

Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG


Title	Two-layer Near-lossless HDR Coding with Backward Compatibility to JPEG
Authors	Hiroyuki Kobayashi, Osamu Watanabe, Hitoshi Kiya
Abstract	We propose an efficient two-layer near-lossless coding method using an extended histogram packing technique with backward compatibility to the legacy JPEG standard. The JPEG XT, which is the international standard to compress HDR images, adopts a two-layer coding method for backward compatibility to the legacy JPEG standard. However, there are two problems with this two-layer coding method. One is that it does not exhibit better near-lossless performance than other methods for HDR image compression with single-layer structure. The other problem is that the determining the appropriate values of the coding parameters may be required for each input image to achieve good compression performance of near-lossless compression with the two-layer coding method of the JPEG XT. To solve these problems, we focus on a histogram-packing technique that takes into account the histogram sparseness of HDR images. We used zero-skip quantization, which is an extension of the histogram-packing technique proposed for lossless coding, for implementing the proposed near-lossless coding method. The experimental results indicate that the proposed method exhibits not only a better near-lossless compression performance than that of the two-layer coding method of the JPEG XT, but also there are no issue regarding the combination of parameter values without losing backward compatibility to the JPEG standard.
Tasks	Image Compression, Quantization
Published	2019-05-09
URL	https://arxiv.org/abs/1905.04129v1
PDF	https://arxiv.org/pdf/1905.04129v1.pdf
PWC	https://paperswithcode.com/paper/two-layer-near-lossless-hdr-coding-with
Repo
Framework

Perturbation Validation: A New Heuristic to Validate Machine Learning Models


Title	Perturbation Validation: A New Heuristic to Validate Machine Learning Models
Authors	Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor
Abstract	This paper introduces Perturbation Validation (PV), a new heuristic to validate machine learning models. PV does not rely on test data. Instead, it perturbs training data labels, re-trains the model against the perturbed data, then uses the consequent training accuracy decrease rate to assess model fit. PV also differs from traditional statistical approaches, which make judgements without considering label distribution. We evaluate PV on 10 real-world datasets and 6 synthetic datasets. Our results demonstrate that PV is more discriminating about model fit than existing validation approaches and it accords well with widely-held intuitions concerning the properties of a good model fit measurement. We also show that PV complements existing validation approaches, allowing us to give explanations for some of the issues present in the recently-debated “apparent paradox” that high capacity (potentially “overfitted”) models may, nevertheless, exhibit good generalisation ability.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10201v3
PDF	https://arxiv.org/pdf/1905.10201v3.pdf
PWC	https://paperswithcode.com/paper/perturbed-model-validation-a-new-framework-to
Repo
Framework

CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images


Title	CNN based Extraction of Panels/Characters from Bengali Comic Book Page Images
Authors	Arpita Dutta, Samit Biswas
Abstract	Peoples nowadays prefer to use digital gadgets like cameras or mobile phones for capturing documents. Automatic extraction of panels/characters from the images of a comic document is challenging due to the wide variety of drawing styles adopted by writers, beneficial for readers to read them on mobile devices at any time and useful for automatic digitization. Most of the methods for localization of panel/character rely on the connected component analysis or page background mask and are applicable only for a limited comic dataset. This work proposes a panel/character localization architecture based on the features of YOLO and CNN for extraction of both panels and characters from comic book images. The method achieved remarkable results on Bengali Comic Book Image dataset (BCBId) consisting of total $4130$ images, developed by us as well as on a variety of publicly available comic datasets in other languages, i.e. eBDtheque, Manga 109 and DCM dataset.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09233v1
PDF	https://arxiv.org/pdf/1910.09233v1.pdf
PWC	https://paperswithcode.com/paper/cnn-based-extraction-of-panelscharacters-from
Repo
Framework

PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds


Title	PointNLM: Point Nonlocal-Means for vegetation segmentation based on middle echo point clouds
Authors	Jonathan Li, Rongren Wu, Yiping Chen, Qing Zhu, Zhipeng Luo, Cheng Wang
Abstract	Middle-echo, which covers one or a few corresponding points, is a specific type of 3D point cloud acquired by a multi-echo laser scanner. In this paper, we propose a novel approach for automatic segmentation of trees that leverages middle-echo information from LiDAR point clouds. First, using a convolution classification method, the proposed type of point clouds reflected by the middle echoes are identified from all point clouds. The middle-echo point clouds are distinguished from the first and last echoes. Hence, the crown positions of the trees are quickly detected from the huge number of point clouds. Second, to accurately extract trees from all point clouds, we propose a 3D deep learning network, PointNLM, to semantically segment tree crowns. PointNLM captures the long-range relationship between the point clouds via a non-local branch and extracts high-level features via max-pooling applied to unordered points. The whole framework is evaluated using the Semantic 3D reduced-test set. The IoU of tree point cloud segmentation reached 0.864. In addition, the semantic segmentation network was tested using the Paris-Lille-3D dataset. The average IoU outperformed several other popular methods. The experimental results indicate that the proposed algorithm provides an excellent solution for vegetation segmentation from LiDAR point clouds.
Tasks	Semantic Segmentation
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08476v2
PDF	https://arxiv.org/pdf/1906.08476v2.pdf
PWC	https://paperswithcode.com/paper/pointnlm-point-nonlocal-means-for-vegetation
Repo
Framework

The Effect of Visual Design in Image Classification


Title	The Effect of Visual Design in Image Classification
Authors	Naftali Cohen, Tucker Balch, Manuela Veloso
Abstract	Financial companies continuously analyze the state of the markets to rethink and adjust their investment strategies. While the analysis is done on the digital form of data, decisions are often made based on graphical representations in white papers or presentation slides. In this study, we examine whether binary decisions are better to be decided based on the numeric or the visual representation of the same data. Using two data sets, a matrix of numerical data with spatial dependencies and financial data describing the state of the S&P index, we compare the results of supervised classification based on the original numerical representation and the visual transformation of the same data. We show that, for these data sets, the visual transformation results in higher predictability skill compared to the original form of the data. We suggest thinking of the visual representation of numeric data, effectively, as a combination of dimensional reduction and feature engineering techniques. In particular, if the visual layout encapsulates the full complexity of the data. In this view, thoughtful visual design can guard against overfitting, or introduce new features – all of which benefit the learning process, and effectively lead to better recognition of meaningful patterns.
Tasks	Feature Engineering, Image Classification
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09567v2
PDF	https://arxiv.org/pdf/1907.09567v2.pdf
PWC	https://paperswithcode.com/paper/the-effect-of-visual-design-in-image
Repo
Framework

Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection


Title	Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection
Authors	Maxim Pisov, Mikhail Goncharov, Nadezhda Kurochkina, Sergey Morozov, Victor Gombolevskiy, Valeria Chernina, Anton Vladzymyrskyy, Ksenia Zamyatina, Anna Chesnokova, Igor Pronin, Michael Shifrin, Mikhail Belyaev
Abstract	Midline shift (MLS) is a well-established factor used for outcome prediction in traumatic brain injury, stroke and brain tumors. The importance of automatic estimation of MLS was recently highlighted by ACR Data Science Institute. In this paper we introduce a novel deep learning based approach for the problem of MLS detection, which exploits task-specific structural knowledge. We evaluate our method on a large dataset containing heterogeneous images with significant MLS and show that its mean error approaches the inter-expert variability. Finally, we show the robustness of our approach by validating it on an external dataset, acquired during routine clinical practice.
Tasks
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04568v3
PDF	https://arxiv.org/pdf/1908.04568v3.pdf
PWC	https://paperswithcode.com/paper/incorporating-task-specific-structural
Repo
Framework

Successor Options: An Option Discovery Framework for Reinforcement Learning


Title	Successor Options: An Option Discovery Framework for Reinforcement Learning
Authors	Rahul Ramesh, Manan Tomar, Balaraman Ravindran
Abstract	The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces easily. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch.
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05731v1
PDF	https://arxiv.org/pdf/1905.05731v1.pdf
PWC	https://paperswithcode.com/paper/successor-options-an-option-discovery
Repo
Framework

EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition


Title	EXPERTNet Exigent Features Preservative Network for Facial Expression Recognition
Authors	Monu Verma, Jaspreet Kaur Bhui, Santosh Vipparthi, Girdhari Singh
Abstract	Facial expressions have essential cues to infer the humans state of mind, that conveys adequate information to understand individuals actual feelings. Thus, automatic facial expression recognition is an interesting and crucial task to interpret the humans cognitive state through the machine. In this paper, we proposed an Exigent Features Preservative Network (EXPERTNet), to describe the features of the facial expressions. The EXPERTNet extracts only pertinent features and neglect others by using exigent feature (ExFeat) block, mainly comprises of elective layer. Specifically, elective layer selects the desired edge variation features from the previous layer outcomes, which are generated by applying different sized filters as 1 x 1, 3 x 3, 5 x 5 and 7 x 7. Different sized filters aid to elicits both micro and high-level features that enhance the learnability of neurons. ExFeat block preserves the spatial structural information of the facial expression, which allows to discriminate between different classes of facial expressions. Visual representation of the proposed method over different facial expressions shows the learning capability of the neurons of different layers. Experimental and comparative analysis results over four comprehensive datasets CK+, MMI DISFA and GEMEP-FERA, ensures the better performance of the proposed network as compared to existing networks.
Tasks	Facial Expression Recognition
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06658v1
PDF	http://arxiv.org/pdf/1904.06658v1.pdf
PWC	https://paperswithcode.com/paper/expertnet-exigent-features-preservative
Repo
Framework

Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data


Title	Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data
Authors	Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uǧurbil, Mehmet Akçakaya
Abstract	Deep learning (DL) has emerged as a tool for improving accelerated MRI reconstruction. A common strategy among DL methods is the physics-based approach, where a regularized iterative algorithm alternating between data consistency and a regularizer is unrolled for a finite number of iterations. This unrolled network is then trained end-to-end in a supervised manner, using fully-sampled data as ground truth for the network output. However, in a number of scenarios, it is difficult to obtain fully-sampled datasets, due to physiological constraints such as organ motion or physical constraints such as signal decay. In this work, we tackle this issue and propose a self-supervised learning strategy that enables physics-based DL reconstruction without fully-sampled data. Our approach is to divide the acquired sub-sampled points for each scan into training and validation subsets. During training, data consistency is enforced over the training subset, while the validation subset is used to define the loss function. Results show that the proposed self-supervised learning method successfully reconstructs images without fully-sampled data, performing similarly to the supervised approach that is trained with fully-sampled references. This has implications for physics-based inverse problem approaches for other settings, where fully-sampled data is not available or possible to acquire.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09116v1
PDF	https://arxiv.org/pdf/1910.09116v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-physics-based-deep-learning
Repo
Framework

Neural Machine Translation with 4-Bit Precision and Beyond


Title	Neural Machine Translation with 4-Bit Precision and Beyond
Authors	Alham Fikri Aji, Kenneth Heafield
Abstract	Neural Machine Translation (NMT) is resource intensive. We design a quantization procedure to compress NMT models better for devices with limited hardware capability. Because most neural network parameters are near zero, we employ logarithmic quantization in lieu of fixed-point quantization. However, we find bias terms are less amenable to log quantization but note they comprise a tiny fraction of the model, so we leave them uncompressed. We also propose to use an error-feedback mechanism during retraining, to preserve the compressed model as a stale gradient. We empirically show that NMT models based on Transformer or RNN architecture can be compressed up to 4-bit precision without any noticeable quality degradation. Models can be compressed up to binary precision, albeit with lower quality. The RNN architecture seems to be more robust to quantization, compared to the Transformer.
Tasks	Machine Translation, Quantization
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06091v2
PDF	https://arxiv.org/pdf/1909.06091v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-with-4-bit
Repo
Framework

Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations


Title	Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations
Authors	Shin’ya Yamaguchi, Sekitoshi Kanai, Tetsuya Shioda, Shoichiro Takeda
Abstract	Self-supervised learning is one of the most promising approaches to learn representations capturing semantic features in images without any manual annotation cost. To learn useful representations, a self-supervised model solves a pretext-task, which is defined by data itself. Among a number of pretext-tasks, the rotation prediction task (Rotation) achieves better representations for solving various target tasks despite its simplicity of the implementation. However, we found that Rotation can fail to capture semantic features related to image textures and colors. To tackle this problem, we introduce a learning technique called multiple pretext-task for self-supervised learning (MP-SSL), which solves multiple pretext-task in addition to Rotation simultaneously. In order to capture features of textures and colors, we employ the transformations of image enhancements (e.g., sharpening and solarizing) as the additional pretext-tasks. MP-SSL efficiently trains a model by leveraging a Frank-Wolfe based multi-task training algorithm. Our experimental results show MP-SSL models outperform Rotation on multiple standard benchmarks and achieve state-of-the-art performance on Places-205.
Tasks
Published	2019-12-25
URL	https://arxiv.org/abs/1912.11603v1
PDF	https://arxiv.org/pdf/1912.11603v1.pdf
PWC	https://paperswithcode.com/paper/multiple-pretext-task-for-self-supervised
Repo
Framework

TextCohesion: Detecting Text for Arbitrary Shapes


Title	TextCohesion: Detecting Text for Arbitrary Shapes
Authors	Weijia Wu, Jici Xing, Hong Zhou
Abstract	In this paper, we propose a pixel-wise method named TextCohesion for scene text detection, which splits a text instance into five key components: a Text Skeleton and four Directional Pixel Regions. These components are easier to handle than the entire text instance. A confidence scoring mechanism is designed to filter characters that are similar to text. Our method can integrate text contexts intensively when backgrounds are complex. Experiments on two curved challenging benchmarks demonstrate that TextCohesion outperforms state-of-the-art methods, achieving the F-measure of 84.6% on Total-Text and bfseries86.3% on SCUT-CTW1500.
Tasks	Curved Text Detection, Scene Text Detection
Published	2019-04-22
URL	http://arxiv.org/abs/1904.12640v2
PDF	http://arxiv.org/pdf/1904.12640v2.pdf
PWC	https://paperswithcode.com/paper/190412640
Repo
Framework

Deep Gradient Boosting – Layer-wise Input Normalization of Neural Networks


Title	Deep Gradient Boosting – Layer-wise Input Normalization of Neural Networks
Authors	Erhan Bilal
Abstract	Stochastic gradient descent (SGD) has been the dominant optimization method for training deep neural networks due to its many desirable properties. One of the more remarkable and least understood quality of SGD is that it generalizes relatively well on unseen data even when the neural network has millions of parameters. We hypothesize that in certain cases it is desirable to relax its intrinsic generalization properties and introduce an extension of SGD called deep gradient boosting (DGB). The key idea of DGB is that back-propagated gradients inferred using the chain rule can be viewed as pseudo-residual targets of a gradient boosting problem. Thus at each layer of a neural network the weight update is calculated by solving the corresponding boosting problem using a linear base learner. The resulting weight update formula can also be viewed as a normalization procedure of the data that arrives at each layer during the forward pass. When implemented as a separate input normalization layer (INN) the new architecture shows improved performance on image recognition tasks when compared to the same architecture without normalization layers. As opposed to batch normalization (BN), INN has no learnable parameters however it matches its performance on CIFAR10 and ImageNet classification tasks.
Tasks
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12608v2
PDF	https://arxiv.org/pdf/1907.12608v2.pdf
PWC	https://paperswithcode.com/paper/deep-gradient-boosting
Repo
Framework

An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks


Title	An Efficient Explorative Sampling Considering the Generative Boundaries of Deep Generative Neural Networks
Authors	Giyoung Jeon, Haedong Jeong, Jaesik Choi
Abstract	Deep generative neural networks (DGNNs) have achieved realistic and high-quality data generation. In particular, the adversarial training scheme has been applied to many DGNNs and has exhibited powerful performance. Despite of recent advances in generative networks, identifying the image generation mechanism still remains challenging. In this paper, we present an explorative sampling algorithm to analyze generation mechanism of DGNNs. Our method efficiently obtains samples with identical attributes from a query image in a perspective of the trained model. We define generative boundaries which determine the activation of nodes in the internal layer and probe inside the model with this information. To handle a large number of boundaries, we obtain the essential set of boundaries using optimization. By gathering samples within the region surrounded by generative boundaries, we can empirically reveal the characteristics of the internal layers of DGNNs. We also demonstrate that our algorithm can find more homogeneous, the model specific samples compared to the variations of {\epsilon}-based sampling method.
Tasks	Image Generation
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05827v1
PDF	https://arxiv.org/pdf/1912.05827v1.pdf
PWC	https://paperswithcode.com/paper/an-efficient-explorative-sampling-considering
Repo
Framework

Differentiable Mask Pruning for Neural Networks


Title	Differentiable Mask Pruning for Neural Networks
Authors	Ramchalam Kinattinkara Ramakrishnan, Eyyüb Sari, Vahid Partovi Nia
Abstract	Pruning of neural networks is one of the well-known and promising model simplification techniques. Most neural network models are large and require expensive computations to predict new instances. It is imperative to compress the network to deploy models on low resource devices. Most compression techniques, especially pruning have been focusing on computer vision and convolution neural networks. Existing techniques are complex and require multi-stage optimization and fine-tuning to recover the state-of-the-art accuracy. We introduce a \emph{Differentiable Mask Pruning} (DMP), that simplifies the network while training, and can be used to induce sparsity on weight, filter, node or sub-network. Our method achieves competitive results on standard vision and NLP benchmarks, and is easy to integrate within the deep learning toolbox. DMP bridges the gap between neural model compression and differentiable neural architecture search.
Tasks	Model Compression, Neural Architecture Search
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04567v1
PDF	https://arxiv.org/pdf/1909.04567v1.pdf
PWC	https://paperswithcode.com/paper/differentiable-mask-pruning-for-neural
Repo
Framework