Paper Group AWR 19
Post-Training Piecewise Linear Quantization for Deep Neural Networks. Camera Trace Erasing. Multi-organ Segmentation over Partially Labeled Datasets with Multi-scale Feature Abstraction. Self-Supervised Log Parsing. Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study. Adaptive binarization based on …
Post-Training Piecewise Linear Quantization for Deep Neural Networks
Title | Post-Training Piecewise Linear Quantization for Deep Neural Networks |
Authors | Jun Fang, Ali Shafiee, Hamzah Abdel-Aziz, David Thorsley, Georgios Georgiadis, Joseph Hassoun |
Abstract | Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training dataset. The well-established uniform scheme for post-training quantization achieves satisfactory results by converting neural networks from full-precision to 8-bit fixed-point integers. However, it suffers from significant performance degradation when quantizing to lower bit-widths. In this paper, we propose a piecewise linear quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Our approach breaks the entire quantization range into non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. Optimal breakpoints that divide the entire range are found by minimizing the quantization error. Compared to state-of-the-art post-training quantization methods, experimental results show that our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead. |
Tasks | Image Classification, Object Detection, Quantization, Semantic Segmentation |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2002.00104v2 |
https://arxiv.org/pdf/2002.00104v2.pdf | |
PWC | https://paperswithcode.com/paper/near-lossless-post-training-quantization-of |
Repo | https://github.com/jakc4103/piecewise-quantization |
Framework | pytorch |
Camera Trace Erasing
Title | Camera Trace Erasing |
Authors | Chang Chen, Zhiwei Xiong, Xiaoming Liu, Feng Wu |
Abstract | Camera trace is a unique noise produced in digital imaging process. Most existing forensic methods analyze camera trace to identify image origins. In this paper, we address a new low-level vision problem, camera trace erasing, to reveal the weakness of trace-based forensic methods. A comprehensive investigation on existing anti-forensic methods reveals that it is non-trivial to effectively erase camera trace while avoiding the destruction of content signal. To reconcile these two demands, we propose Siamese Trace Erasing (SiamTE), in which a novel hybrid loss is designed on the basis of Siamese architecture for network training. Specifically, we propose embedded similarity, truncated fidelity, and cross identity to form the hybrid loss. Compared with existing anti-forensic methods, SiamTE has a clear advantage for camera trace erasing, which is demonstrated in three representative tasks. Code and dataset are available at https://github.com/ngchc/CameraTE. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.06951v1 |
https://arxiv.org/pdf/2003.06951v1.pdf | |
PWC | https://paperswithcode.com/paper/camera-trace-erasing |
Repo | https://github.com/ngchc/CameraTE |
Framework | pytorch |
Multi-organ Segmentation over Partially Labeled Datasets with Multi-scale Feature Abstraction
Title | Multi-organ Segmentation over Partially Labeled Datasets with Multi-scale Feature Abstraction |
Authors | Xi Fang, Pingkun Yan |
Abstract | This paper presents a unified training strategy that enables a novel multi-scale deep neural network to be trained on multiple partially labeled datasets for multi-organ segmentation. Multi-scale contextual information is effective for pixel-level label prediction, i.e. image segmentation. However, such important information is only partially exploited by the existing methods. In this paper, we propose a new network architecture for multi-scale feature abstraction, which integrates pyramid feature analysis into an image segmentation model. To bridge the semantic gap caused by directly merging features from different scales, an equal convolutional depth mechanism is proposed. In addition, we develop a deep supervision mechanism for refining outputs in different scales. To fully leverage the segmentation features from different scales, we design an adaptive weighting layer to fuse the outputs in an automatic fashion. All these features together integrate into a pyramid-input pyramid-output network for efficient feature extraction. Last but not least, to alleviate the hunger for fully annotated data in training deep segmentation models, a unified training strategy is proposed to train one segmentation model on multiple partially labeled datasets for multi-organ segmentation with a novel target adaptive loss. Our proposed method was evaluated on four publicly available datasets, including BTCV, LiTS, KiTS and Spleen, where very promising performance has been achieved. The source code of this work is publicly shared at https://github.com/DIAL-RPI/PIPO-FAN for others to easily reproduce the work and build their own models with the introduced mechanisms. |
Tasks | Semantic Segmentation |
Published | 2020-01-01 |
URL | https://arxiv.org/abs/2001.00208v1 |
https://arxiv.org/pdf/2001.00208v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-organ-segmentation-over-partially |
Repo | https://github.com/DIAL-RPI/PIPO-FAN |
Framework | pytorch |
Self-Supervised Log Parsing
Title | Self-Supervised Log Parsing |
Authors | Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao |
Abstract | Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. Existing approaches rely on log-specific heuristics or manual rule extraction. These are often specialized in parsing certain log types, and thus, limit performance scores and generalization. We propose a novel parsing technique called NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. We evaluate the parsing performance of NuLog on 10 real-world log datasets and compare the results with 12 parsing techniques. The results show that NuLog outperforms existing methods in parsing accuracy with an average of 99% and achieves the lowest edit distance to the ground truth templates. Additionally, two case studies are conducted to demonstrate the ability of the approach for log-based anomaly detection in both supervised and unsupervised scenario. The results show that NuLog can be successfully used to support troubleshooting tasks. The implementation is available at https://github.com/nulog/nulog. |
Tasks | Anomaly Detection, Fault Detection, Language Modelling |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07905v1 |
https://arxiv.org/pdf/2003.07905v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-log-parsing |
Repo | https://github.com/nulog/nulog |
Framework | none |
Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study
Title | Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study |
Authors | Zhibin Zhao, Tianfu Li, Jingyao Wu, Chuang Sun, Shibin Wang, Ruqiang Yan, Xuefeng Chen |
Abstract | With the development of artificial intelligence and deep learning (DL) techniques, rotating machinery intelligent diagnosis has gone through tremendous progress with verified success and the classification accuracies of many DL-based intelligent diagnosis algorithms are tending to 100%. However, different datasets, configurations, and hyper-parameters are often recommended to be used in performance verification for different types of models, and few open source codes are made public for evaluation and comparisons. Therefore, unfair comparisons and ineffective improvement may exist in rotating machinery intelligent diagnosis, which limits the advancement of this field. To address these issues, we perform an extensive evaluation of four kinds of models with various datasets to provide a benchmark study within the same framework. In this paper, we first gather most of the publicly available datasets and give the complete benchmark study of DL-based intelligent algorithms under two data split strategies, five input formats, three normalization methods, and four augmentation methods. Second, we integrate the whole evaluation codes into a code library and release this code library to the public for better development of this field. Third, we use the specific-designed cases to point out the existing issues, including class imbalance, generalization ability, interpretability, few-shot learning, and model selection. By these works, we release a unified code framework for comparing and testing models fairly and quickly, emphasize the importance of open source codes, provide the baseline accuracy (a lower bound) to avoid useless improvement, and discuss potential future directions in this field. The code library is available at \url{https://github.com/ZhaoZhibin/DL-based-Intelligent-Diagnosis-Benchmark}. |
Tasks | Few-Shot Learning, Model Selection |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.03315v1 |
https://arxiv.org/pdf/2003.03315v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-algorithms-for-rotating |
Repo | https://github.com/ZhaoZhibin/DL-based-Intelligent-Diagnosis-Benchmark |
Framework | pytorch |
Adaptive binarization based on fuzzy integrals
Title | Adaptive binarization based on fuzzy integrals |
Authors | Francesco Bardozzo, Borja De La Osa, Lubomira Horanska, Javier Fumanal-Idocin, Mattia delli Priscoli, Luigi Troiano, Roberto Tagliaferri, Javier Fernandez, Humberto Bustince |
Abstract | Adaptive binarization methodologies threshold the intensity of the pixels with respect to adjacent pixels exploiting the integral images. In turn, the integral images are generally computed optimally using the summed-area-table algorithm (SAT). This document presents a new adaptive binarization technique based on fuzzy integral images through an efficient design of a modified SAT for fuzzy integrals. We define this new methodology as FLAT (Fuzzy Local Adaptive Thresholding). The experimental results show that the proposed methodology have produced an image quality thresholding often better than traditional algorithms and saliency neural networks. We propose a new generalization of the Sugeno and CF 1,2 integrals to improve existing results with an efficient integral image computation. Therefore, these new generalized fuzzy integrals can be used as a tool for grayscale processing in real-time and deep-learning applications. Index Terms: Image Thresholding, Image Processing, Fuzzy Integrals, Aggregation Functions |
Tasks | |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.08755v1 |
https://arxiv.org/pdf/2003.08755v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-binarization-based-on-fuzzy |
Repo | https://github.com/lodeguns/FuzzyAdaptiveBinarization |
Framework | none |
Predictive Coding for Locally-Linear Control
Title | Predictive Coding for Locally-Linear Control |
Authors | Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui |
Abstract | High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks. The Learning Controllable Embedding (LCE) framework addresses these challenges by embedding the observations into a lower dimensional latent space, estimating the latent dynamics, and then performing control directly in the latent space. To ensure the learned latent dynamics are predictive of next-observations, all existing LCE approaches decode back into the observation space and explicitly perform next-observation prediction—a challenging high-dimensional task that furthermore introduces a large number of nuisance parameters (i.e., the decoder) which are discarded during control. In this paper, we propose a novel information-theoretic LCE approach and show theoretically that explicit next-observation prediction can be replaced with predictive coding. We then use predictive coding to develop a decoder-free LCE model whose latent dynamics are amenable to locally-linear control. Extensive experiments on benchmark tasks show that our model reliably learns a controllable latent space that leads to superior performance when compared with state-of-the-art LCE baselines. |
Tasks | Decision Making |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.01086v1 |
https://arxiv.org/pdf/2003.01086v1.pdf | |
PWC | https://paperswithcode.com/paper/predictive-coding-for-locally-linear-control |
Repo | https://github.com/VinAIResearch/PC3-pytorch |
Framework | pytorch |
OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Title | OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features |
Authors | Anton Osokin, Denis Sumin, Vasily Lomakin |
Abstract | In this paper, we consider the task of one-shot object detection, which consists in detecting objects defined by a single demonstration. Differently from the standard object detection, the classes of objects used for training and testing do not overlap. We build the one-stage system that performs localization and recognition jointly. We use dense correlation matching of learned local features to find correspondences, a feed-forward geometric transformation model to align features and bilinear resampling of the correlation tensor to compute the detection score of the aligned features. All the components are differentiable, which allows end-to-end training. Experimental evaluation on several challenging domains (retail products, 3D objects, buildings and logos) shows that our method can detect unseen classes (e.g., toothpaste when trained on groceries) and outperforms several baselines by a significant margin. Our code is available online: https://github.com/aosokin/os2d . |
Tasks | Object Detection, One-Shot Object Detection |
Published | 2020-03-15 |
URL | https://arxiv.org/abs/2003.06800v1 |
https://arxiv.org/pdf/2003.06800v1.pdf | |
PWC | https://paperswithcode.com/paper/os2d-one-stage-one-shot-object-detection-by |
Repo | https://github.com/aosokin/os2d |
Framework | pytorch |
Identification of Non-Linear RF Systems Using Backpropagation
Title | Identification of Non-Linear RF Systems Using Backpropagation |
Authors | Andreas Toftegaard Kristensen, Andreas Burg, Alexios Balatsoukas-Stimming |
Abstract | In this work, we use deep unfolding to view cascaded non-linear RF systems as model-based neural networks. This view enables the direct use of a wide range of neural network tools and optimizers to efficiently identify such cascaded models. We demonstrate the effectiveness of this approach through the example of digital self-interference cancellation in full-duplex communications where an IQ imbalance model and a non-linear PA model are cascaded in series. For a self-interference cancellation performance of approximately 44.5 dB, the number of model parameters can be reduced by 74% and the number of operations per sample can be reduced by 79% compared to an expanded linear-in-parameters polynomial model. |
Tasks | |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09877v2 |
https://arxiv.org/pdf/2001.09877v2.pdf | |
PWC | https://paperswithcode.com/paper/identification-of-non-linear-rf-systems-using |
Repo | https://github.com/A-T-Kristensen/rf_unfolding |
Framework | tf |
Probably Approximately Correct Vision-Based Planning using Motion Primitives
Title | Probably Approximately Correct Vision-Based Planning using Motion Primitives |
Authors | Sushant Veer, Anirudha Majumdar |
Abstract | This paper presents a deep reinforcement learning approach for synthesizing vision-based planners that provably generalize to novel environments (i.e., environments unseen during training). We leverage the Probably Approximately Correct (PAC)-Bayes framework to obtain an upper bound on the expected cost of policies across all environments. Minimizing the PAC-Bayes upper bound thus trains policies that are accompanied by a certificate of performance on novel environments. The training pipeline we propose provides strong generalization guarantees for deep neural network policies by (a) obtaining a good prior distribution on the space of policies using Evolutionary Strategies (ES) followed by (b) formulating the PAC-Bayes optimization as an efficiently-solvable parametric convex optimization problem. We demonstrate the efficacy of our approach for producing strong generalization guarantees for learned vision-based motion planners through two simulated examples: (1) an Unmanned Aerial Vehicle (UAV) navigating obstacle fields with an onboard vision sensor, and (2) a dynamic quadrupedal robot traversing rough terrains with proprioceptive and exteroceptive sensors. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12852v1 |
https://arxiv.org/pdf/2002.12852v1.pdf | |
PWC | https://paperswithcode.com/paper/probably-approximately-correct-vision-based |
Repo | https://github.com/irom-lab/PAC-Vision-Planning |
Framework | pytorch |
Stable behaviour of infinitely wide deep neural networks
Title | Stable behaviour of infinitely wide deep neural networks |
Authors | Stefano Favaro, Sandra Fortini, Stefano Peluchetti |
Abstract | We consider fully connected feed-forward deep neural networks (NNs) where weights and biases are independent and identically distributed as symmetric centered stable distributions. Then, we show that the infinite wide limit of the NN, under suitable scaling on the weights, is a stochastic process whose finite-dimensional distributions are multivariate stable distributions. The limiting process is referred to as the stable process, and it generalizes the class of Gaussian processes recently obtained as infinite wide limits of NNs (Matthews at al., 2018b). Parameters of the stable process can be computed via an explicit recursion over the layers of the network. Our result contributes to the theory of fully connected feed-forward deep NNs, and it paves the way to expand recent lines of research that rely on Gaussian infinite wide limits. |
Tasks | Gaussian Processes |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00394v1 |
https://arxiv.org/pdf/2003.00394v1.pdf | |
PWC | https://paperswithcode.com/paper/stable-behaviour-of-infinitely-wide-deep |
Repo | https://github.com/stepelu/deep-stable |
Framework | none |
ORCSolver: An Efficient Solver for Adaptive GUI Layout with OR-Constraints
Title | ORCSolver: An Efficient Solver for Adaptive GUI Layout with OR-Constraints |
Authors | Yue Jiang, Wolfgang Stuerzlinger, Matthias Zwicker, Christof Lutteroth |
Abstract | OR-constrained (ORC) graphical user interface layouts unify conventional constraint-based layouts with flow layouts, which enables the definition of flexible layouts that adapt to screens with different sizes, orientations, or aspect ratios with only a single layout specification. Unfortunately, solving ORC layouts with current solvers is time-consuming and the needed time increases exponentially with the number of widgets and constraints. To address this challenge, we propose ORCSolver, a novel solving technique for adaptive ORC layouts, based on a branch-and-bound approach with heuristic preprocessing. We demonstrate that ORCSolver simplifies ORC specifications at runtime and our approach can solve ORC layout specifications efficiently at near-interactive rates. |
Tasks | |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.09925v1 |
https://arxiv.org/pdf/2002.09925v1.pdf | |
PWC | https://paperswithcode.com/paper/orcsolver-an-efficient-solver-for-adaptive |
Repo | https://github.com/YueJiang-nj/ORCSolver-CHI2020 |
Framework | none |
Semantically Multi-modal Image Synthesis
Title | Semantically Multi-modal Image Synthesis |
Authors | Zhen Zhu, Zhiliang Xu, Ansheng You, Xiang Bai |
Abstract | In this paper, we focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level. Previous work seeks to use multiple class-specific generators, constraining its usage in datasets with a small number of classes. We instead propose a novel Group Decreasing Network (GroupDNet) that leverages group convolutions in the generator and progressively decreases the group numbers of the convolutions in the decoder. Consequently, GroupDNet is armed with much more controllability on translating semantic labels to natural images and has plausible high-quality yields for datasets with many classes. Experiments on several challenging datasets demonstrate the superiority of GroupDNet on performing the SMIS task. We also show that GroupDNet is capable of performing a wide range of interesting synthesis applications. Codes and models are available at: https://github.com/Seanseattle/SMIS. |
Tasks | Image Generation |
Published | 2020-03-28 |
URL | https://arxiv.org/abs/2003.12697v2 |
https://arxiv.org/pdf/2003.12697v2.pdf | |
PWC | https://paperswithcode.com/paper/semantically-mutil-modal-image-synthesis |
Repo | https://github.com/Seanseattle/SMIS |
Framework | none |
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Title | ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training |
Authors | Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou |
Abstract | In this paper, we present a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism. Instead of the optimization of one-step ahead prediction in traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction which predicts the next n tokens simultaneously based on previous context tokens at each time step. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large scale dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pre-training corpus. |
Tasks | Abstractive Text Summarization, Question Generation, Text Summarization |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04063v2 |
https://arxiv.org/pdf/2001.04063v2.pdf | |
PWC | https://paperswithcode.com/paper/prophetnet-predicting-future-n-gram-for |
Repo | https://github.com/microsoft/ProphetNet |
Framework | pytorch |
A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching
Title | A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching |
Authors | Jae-Hyun Park, Woo-Jeoung Nam, Seong-Whan Lee |
Abstract | In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-stream with the three input images and reflects the additional augmented pair in the training. As a result, the training process of the deep network is regularized and the network becomes robust for the variance of aerial images. Furthermore, we introduce an ensemble method that is based on the bidirectional network, which is motivated by the isomorphic nature of the geometric transformation. We obtain two global transformation parameters without any additional network or parameters, which alleviate asymmetric matching results and enable significant improvement in performance by fusing two outcomes. For the experiment, we adopt aerial images from Google Earth and the International Society for Photogrammetry and Remote Sensing (ISPRS). To quantitatively assess our result, we apply the probability of correct keypoints (PCK) metric, which measures the degree of matching. The qualitative and quantitative results show the sizable gap of performance compared to the conventional methods for matching the aerial images. All code and our trained model, as well as the dataset are available online. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01325v1 |
https://arxiv.org/pdf/2002.01325v1.pdf | |
PWC | https://paperswithcode.com/paper/a-two-stream-symmetric-network-with |
Repo | https://github.com/jaehyunnn/DeepAerialMatching |
Framework | pytorch |