Paper Group ANR 580
Robust Natural Language Inference Models with Example Forgetting. SpoC: Spoofing Camera Fingerprints. Stiffness: A New Perspective on Generalization in Neural Networks. Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering. Merging-ISP: Multi-Exposure High Dynamic Range Image Signal Processing. Sex Trafficking Detection w …
Robust Natural Language Inference Models with Example Forgetting
Title | Robust Natural Language Inference Models with Example Forgetting |
Authors | Yadollah Yaghoobzadeh, Remi Tachet, T. J. Hazen, Alessandro Sordoni |
Abstract | We investigate whether example forgetting, a recently introduced measure of hardness of examples, can be used to select training examples in order to increase robustness of natural language understanding models in a natural language inference task (MNLI). We analyze forgetting events for MNLI and provide evidence that forgettable examples under simpler models can be used to increase robustness of the recently proposed BERT model, measured by testing an MNLI trained model on HANS, a curated test set that exhibits a shift in distribution compared to the MNLI test set. Moreover, we show that, the “large” version of BERT is more robust than its “base” version but its robustness can still be improved with our approach. |
Tasks | Natural Language Inference |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03861v1 |
https://arxiv.org/pdf/1911.03861v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-natural-language-inference-models-with |
Repo | |
Framework | |
SpoC: Spoofing Camera Fingerprints
Title | SpoC: Spoofing Camera Fingerprints |
Authors | Davide Cozzolino, Justus Thies, Andreas Rössler, Matthias Nießner, Luisa Verdoliva |
Abstract | Thanks to the fast progress in synthetic media generation, creating realistic false images has become very easy. Such images can be used to wrap rich fake news with enhanced credibility, spawning a new wave of high-impact, high-risk misinformation campaigns. Therefore, there is a fast-growing interest in reliable detectors of manipulated media. The most powerful detectors, to date, rely on the subtle traces left by any device on all images acquired by it. In particular, due to proprietary in-camera processes, like demosaicing or compression, each camera model leaves trademark traces that can be exploited for forensic analyses. The absence or distortion of such traces in the target image is a strong hint of manipulation. In this paper, we challenge such detectors to gain better insight into their vulnerabilities. This is an important study in order to build better forgery detectors able to face malicious attacks. Our proposal consists of a GAN-based approach that injects camera traces into synthetic images. Given a GANgenerated image, we insert the traces of a specific camera model into it and deceive state-of-the-art detectors into believing the image was acquired by that model. Likewise, we deceive independent detectors of synthetic GAN images into believing the image is real. Experiments prove the effectiveness of the proposed method in a wide array of conditions. Moreover, no prior information on the attacked detectors is needed, but only sample images from the target camera. |
Tasks | Demosaicking |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12069v1 |
https://arxiv.org/pdf/1911.12069v1.pdf | |
PWC | https://paperswithcode.com/paper/spoc-spoofing-camera-fingerprints |
Repo | |
Framework | |
Stiffness: A New Perspective on Generalization in Neural Networks
Title | Stiffness: A New Perspective on Generalization in Neural Networks |
Authors | Stanislav Fort, Paweł Krzysztof Nowak, Stanislaw Jastrzebski, Srini Narayanan |
Abstract | In this paper we develop a new perspective on generalization of neural networks by proposing and investigating the concept of a neural network stiffness. We measure how stiff a network is by looking at how a small gradient step in the network’s parameters on one example affects the loss on another example. Higher stiffness suggests that a network is learning features that generalize. In particular, we study how stiffness depends on 1) class membership, 2) distance between data points in the input space, 3) training iteration, and 4) learning rate. We present experiments on MNIST, FASHION MNIST, and CIFAR-10/100 using fully-connected and convolutional neural networks, as well as on a transformer-based NLP model. We demonstrate the connection between stiffness and generalization, and observe its dependence on learning rate. When training on CIFAR-100, the stiffness matrix exhibits a coarse-grained behavior indicative of the model’s awareness of super-class membership. In addition, we measure how stiffness between two data points depends on their mutual input-space distance, and establish the concept of a dynamical critical length – a distance below which a parameter update based on a data point influences its neighbors. |
Tasks | |
Published | 2019-01-28 |
URL | https://arxiv.org/abs/1901.09491v3 |
https://arxiv.org/pdf/1901.09491v3.pdf | |
PWC | https://paperswithcode.com/paper/stiffness-a-new-perspective-on-generalization |
Repo | |
Framework | |
Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
Title | Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering |
Authors | Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh |
Abstract | We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable. Instantiated in the context of visual question answering, our probabilistic formulation offers two key conceptual advantages over prior neural-symbolic models for VQA. Firstly, the programs generated by our model are more understandable while requiring lesser number of teaching examples. Secondly, we show that one can pose counterfactual scenarios to the model, to probe its beliefs on the programs that could lead to a specified answer given an image. Our results on the CLEVR and SHAPES datasets verify our hypotheses, showing that the model gets better program (and answer) prediction accuracy even in the low data regime, and allows one to probe the coherence and consistency of reasoning performed. |
Tasks | Question Answering, Visual Question Answering |
Published | 2019-02-21 |
URL | https://arxiv.org/abs/1902.07864v2 |
https://arxiv.org/pdf/1902.07864v2.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-neural-symbolic-models-for |
Repo | |
Framework | |
Merging-ISP: Multi-Exposure High Dynamic Range Image Signal Processing
Title | Merging-ISP: Multi-Exposure High Dynamic Range Image Signal Processing |
Authors | Prashant Chaudhari, Franziska Schirrmacher, Andreas Maier, Christian Riess, Thomas Köhler |
Abstract | The image signal processing pipeline (ISP) is a core element of digital cameras to capture high-quality displayable images from raw data. In high dynamic range (HDR) imaging, ISPs include steps like demosaicing of raw color filter array (CFA) data at different exposure times, alignment of the exposures, conversion to HDR domain, and exposure merging into an HDR image. Traditionally, such pipelines are built by cascading algorithms addressing the individual subtasks. However, cascaded designs suffer from error propagations since simply combining multiple processing steps is not necessarily optimal for the entire imaging task. This paper proposes a multi-exposure high dynamic range image signal processing pipeline (Merging-ISP) to jointly solve all subtasks for HDR imaging. Our pipeline is modeled by a deep neural network architecture. As such, it is end-to-end trainable, circumvents the use of complex, hand-crafted algorithms in its core, and mitigates error propagation. Merging-ISP enables direct reconstructions of HDR images from multiple differently exposed raw CFA images captured from dynamic scenes. We compared Merging-ISP against different alternative cascaded pipelines. End-to-end learning leads to HDR reconstructions of high perceptual quality and quantitatively outperforms competing ISPs by more than 1 dB in terms of PSNR. |
Tasks | Demosaicking |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04762v1 |
https://arxiv.org/pdf/1911.04762v1.pdf | |
PWC | https://paperswithcode.com/paper/merging-isp-multi-exposure-high-dynamic-range |
Repo | |
Framework | |
Sex Trafficking Detection with Ordinal Regression Neural Networks
Title | Sex Trafficking Detection with Ordinal Regression Neural Networks |
Authors | Longshaokan Wang, Eric Laber, Yeng Saanchi, Sherrie Caltagirone |
Abstract | Sex trafficking is a global epidemic. Escort websites are a primary vehicle for selling the services of such trafficking victims and thus a major driver of trafficker revenue. Many law enforcement agencies do not have the resources to manually identify leads from the millions of escort ads posted across dozens of public websites. We propose an ordinal regression neural network to identify escort ads that are likely linked to sex trafficking. Our model uses a modified cost function to mitigate inconsistencies in predictions often associated with nonparametric ordinal regression and leverages recent advancements in deep learning to improve prediction accuracy. The proposed method significantly improves on the previous state-of-the-art on Trafficking-10K, an expert-annotated dataset of escort ads. Additionally, because traffickers use acronyms, deliberate typographical errors, and emojis to replace explicit keywords, we demonstrate how to expand the lexicon of trafficking flags through word embeddings and t-SNE. |
Tasks | Word Embeddings |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05434v2 |
https://arxiv.org/pdf/1908.05434v2.pdf | |
PWC | https://paperswithcode.com/paper/sex-trafficking-detection-with-ordinal |
Repo | |
Framework | |
Joint Demosaicing and Super-Resolution (JDSR): Network Design and Perceptual Optimization
Title | Joint Demosaicing and Super-Resolution (JDSR): Network Design and Perceptual Optimization |
Authors | Xuan Xu, Yanfang, Ye, Xin Li |
Abstract | Image demosaicing and super-resolution are two important tasks in color imaging pipeline. So far they have been mostly independently studied in the open literature of deep learning; little is known about the potential benefit of formulating a joint demosaicing and super-resolution (JDSR) problem. In this paper, we propose an end-to-end optimization solution to the JDSR problem and demonstrate its practical significance in computational imaging. Our technical contributions are mainly two-fold. On network design, we have developed a Densely-connected Squeeze-and-Excitation Residual Network (DSERN) for JDSR. For the first time, we address the issue of spatio-spectral attention for color images and discuss how to achieve better information flow by smooth activation for JDSR. Experimental results have shown moderate PSNR/SSIM gain can be achieved by DSERN over previous naive network architectures. On perceptual optimization, we propose to leverage the latest ideas including relativistic discriminator and pre-excitation perceptual loss function to further improve the visual quality of reconstructed images. Our extensive experiment results have shown that Texture-enhanced Relativistic average Generative Adversarial Network (TRaGAN) can produce both subjectively more pleasant images and objectively lower perceptual distortion scores than standard GAN for JDSR. We have verified the benefit of JDSR to high-quality image reconstruction from real-world Bayer pattern collected by NASA Mars Curiosity. |
Tasks | Demosaicking, Image Reconstruction, Super-Resolution |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03558v1 |
https://arxiv.org/pdf/1911.03558v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-demosaicing-and-super-resolution-jdsr |
Repo | |
Framework | |
Deep Camera: A Fully Convolutional Neural Network for Image Signal Processing
Title | Deep Camera: A Fully Convolutional Neural Network for Image Signal Processing |
Authors | Sivalogeswaran Ratnasingam |
Abstract | A conventional camera performs various signal processing steps sequentially to reconstruct an image from a raw Bayer image. When performing these processing in multiple stages the residual error from each stage accumulates in the image and degrades the quality of the final reconstructed image. In this paper, we present a fully convolutional neural network (CNN) to perform defect pixel correction, denoising, white balancing, exposure correction, demosaicing, color transform, and gamma encoding. To our knowledge, this is the first CNN trained end-to-end to perform the entire image signal processing pipeline in a camera. The neural network was trained using a large image database of raw Bayer images. Through extensive experiments, we show that the proposed CNN based image signal processing system performs better than the conventional signal processing pipelines that perform the processing sequentially. |
Tasks | Demosaicking, Denoising |
Published | 2019-08-24 |
URL | https://arxiv.org/abs/1908.09191v1 |
https://arxiv.org/pdf/1908.09191v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-camera-a-fully-convolutional-neural |
Repo | |
Framework | |
Efficient Multi-Domain Network Learning by Covariance Normalization
Title | Efficient Multi-Domain Network Learning by Covariance Normalization |
Authors | Yunsheng Li, Nuno Vasconcelos |
Abstract | The problem of multi-domain learning of deep networks is considered. An adaptive layer is induced per target domain and a novel procedure, denoted covariance normalization (CovNorm), proposed to reduce its parameters. CovNorm is a data driven method of fairly simple implementation, requiring two principal component analyzes (PCA) and fine-tuning of a mini-adaptation layer. Nevertheless, it is shown, both theoretically and experimentally, to have several advantages over previous approaches, such as batch normalization or geometric matrix approximations. Furthermore, CovNorm can be deployed both when target datasets are available sequentially or simultaneously. Experiments show that, in both cases, it has performance comparable to a fully fine-tuned network, using as few as 0.13% of the corresponding parameters per target domain. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10267v1 |
https://arxiv.org/pdf/1906.10267v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-multi-domain-network-learning-by |
Repo | |
Framework | |
Building a mixed-lingual neural TTS system with only monolingual data
Title | Building a mixed-lingual neural TTS system with only monolingual data |
Authors | Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu |
Abstract | When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded. This paper looks into the problem in the encoder-decoder framework when only monolingual data from a target speaker is available. Specifically, we view the problem from two aspects: speaker consistency within an utterance and naturalness. We start the investigation with an Average Voice Model which is built from multi-speaker monolingual data, i.e. Mandarin and English data. On the basis of that, we look into speaker embedding for speaker consistency within an utterance and phoneme embedding for naturalness and intelligibility and study the choice of data for model training. We report the findings and discuss the challenges to build a mixed-lingual TTS system with only monolingual data. |
Tasks | |
Published | 2019-04-12 |
URL | https://arxiv.org/abs/1904.06063v2 |
https://arxiv.org/pdf/1904.06063v2.pdf | |
PWC | https://paperswithcode.com/paper/building-a-mixed-lingual-neural-tts-system |
Repo | |
Framework | |
Visualized Insights into the Optimization Landscape of Fully Convolutional Networks
Title | Visualized Insights into the Optimization Landscape of Fully Convolutional Networks |
Authors | Jianjie Lu, Kai-yu Tong |
Abstract | Many image processing tasks involve image-to-image mapping, which can be addressed well by fully convolutional networks (FCN) without any heavy preprocessing. Although empirically designing and training FCNs can achieve satisfactory results, reasons for the improvement in performance are slightly ambiguous. Our study is to make progress in understanding their generalization abilities through visualizing the optimization landscapes. The visualization of objective functions is obtained by choosing a solution and projecting its vicinity onto a 3D space. We compare three FCN-based networks (two existing models and a new proposed in this paper for comparison) on multiple datasets. It has been observed in practice that the connections from the pre-pooled feature maps to the post-upsampled can achieve better results. We investigate the cause and provide experiments to shows that the skip-layer connections in FCN can promote flat optimization landscape, which is well known to generalize better. Additionally, we explore the relationship between the models generalization ability and loss surface under different batch sizes. Results show that large-batch training makes the model converge to sharp minimizers with chaotic vicinities while small-batch method leads the model to flat minimizers with smooth and nearly convex regions. Our work may contribute to insights and analysis for designing and training FCNs. |
Tasks | |
Published | 2019-01-20 |
URL | http://arxiv.org/abs/1901.08556v1 |
http://arxiv.org/pdf/1901.08556v1.pdf | |
PWC | https://paperswithcode.com/paper/visualized-insights-into-the-optimization |
Repo | |
Framework | |
“Machine LLRning”: Learning to Softly Demodulate
Title | “Machine LLRning”: Learning to Softly Demodulate |
Authors | Ori Shental, Jakob Hoydis |
Abstract | Soft demodulation, or demapping, of received symbols back into their conveyed soft bits, or bit log-likelihood ratios (LLRs), is at the very heart of any modern receiver. In this paper, a trainable universal neural network-based demodulator architecture, dubbed “LLRnet”, is introduced. LLRnet facilitates an improved performance with significantly reduced overall computational complexity. For instance for the commonly used quadrature amplitude modulation (QAM), LLRnet demonstrates LLR estimates approaching the optimal log maximum a-posteriori inference with an order of magnitude less operations than that of the straightforward exact implementation. Link-level simulation examples for the application of LLRnet to 5G-NR and DVB-S.2 are provided. LLRnet is a (yet another) powerful example for the usefulness of applying machine learning to physical layer design. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01512v3 |
https://arxiv.org/pdf/1907.01512v3.pdf | |
PWC | https://paperswithcode.com/paper/machine-llrning-learning-to-softly-demodulate |
Repo | |
Framework | |
An Integrated Image Filter for Enhancing Change Detection Results
Title | An Integrated Image Filter for Enhancing Change Detection Results |
Authors | Dawei Li, Siyuan Yan, Xin Cai, Yan Cao, Sifan Wang |
Abstract | Change detection is a fundamental task in computer vision. Despite significant advances have been made, most of the change detection methods fail to work well in challenging scenes due to ubiquitous noise and interferences. Nowadays, post-processing methods (e.g. MRF, and CRF) aiming to enhance the binary change detection results still fall short of the requirements on universality for distinctive scenes, applicability for different types of detection methods, accuracy, and real-time performance. Inspired by the nature of image filtering, which separates noise from pixel observations and recovers the real structure of patches, we consider utilizing image filters to enhance the detection masks. In this paper, we present an integrated filter which comprises a weighted local guided image filter and a weighted spatiotemporal tree filter. The spatiotemporal tree filter leverages the global spatiotemporal information of adjacent video frames and meanwhile the guided filter carries out local window filtering of pixels, for enhancing the coarse change detection masks. The main contributions are three: (i) the proposed filter can make full use of the information of the same object in consecutive frames to improve its current detection mask by computations on a spatiotemporal minimum spanning tree; (ii) the integrated filter possesses both advantages of local filtering and global filtering; it not only has good edge-preserving property but also can handle heavily textured and colorful foreground regions; and (iii) Unlike some popular enhancement methods (MRF, and CRF) that require either a priori background probabilities or a posteriori foreground probabilities for every pixel to improve the coarse detection masks, our method is a versatile enhancement filter that can be applied after many different types of change detection methods, and is particularly suitable for video sequences. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01301v1 |
https://arxiv.org/pdf/1907.01301v1.pdf | |
PWC | https://paperswithcode.com/paper/an-integrated-image-filter-for-enhancing |
Repo | |
Framework | |
Neural Learning of Online Consumer Credit Risk
Title | Neural Learning of Online Consumer Credit Risk |
Authors | Di Wang, Qi Wu, Wen Zhang |
Abstract | This paper takes a deep learning approach to understand consumer credit risk when e-commerce platforms issue unsecured credit to finance customers’ purchase. The “NeuCredit” model can capture both serial dependences in multi-dimensional time series data when event frequencies in each dimension differ. It also captures nonlinear cross-sectional interactions among different time-evolving features. Also, the predicted default probability is designed to be interpretable such that risks can be decomposed into three components: the subjective risk indicating the consumers’ willingness to repay, the objective risk indicating their ability to repay, and the behavioral risk indicating consumers’ behavioral differences. Using a unique dataset from one of the largest global e-commerce platforms, we show that the inclusion of shopping behavioral data, besides conventional payment records, requires a deep learning approach to extract the information content of these data, which turns out significantly enhancing forecasting performance than the traditional machine learning methods. |
Tasks | Time Series |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01923v1 |
https://arxiv.org/pdf/1906.01923v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-learning-of-online-consumer-credit |
Repo | |
Framework | |
An Experimental Comparison of Old and New Decision Tree Algorithms
Title | An Experimental Comparison of Old and New Decision Tree Algorithms |
Authors | Arman Zharmagambetov, Suryabhan Singh Hada, Miguel Á. Carreira-Perpiñán, Magzhan Gabidolla |
Abstract | This paper presents a detailed comparison of a recently proposed algorithm for optimizing decision trees, tree alternating optimization (TAO), with other popular, established algorithms. We compare their performance on a number of classification and regression datasets of various complexity, different size and dimensionality, across different performance factors: accuracy and tree size (in terms of the number of leaves or the depth of the tree). We find that TAO achieves higher accuracy in nearly all datasets, often by a large margin. |
Tasks | |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03054v2 |
https://arxiv.org/pdf/1911.03054v2.pdf | |
PWC | https://paperswithcode.com/paper/an-experimental-comparison-of-old-and-new |
Repo | |
Framework | |