January 31, 2020

3018 words 15 mins read

Paper Group ANR 36

Outbound Translation User Interface Ptakopet: A Pilot Study. DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks. Geometrization of deep networks for the interpretability of deep learning systems. Learn to Explain Efficiently via Neural Logic Inductive Learning. Robust Subspace Recovery with Adversarial Outliers. Classifying Diagr …

Outbound Translation User Interface Ptakopet: A Pilot Study


Title	Outbound Translation User Interface Ptakopet: A Pilot Study
Authors	Vilém Zouhar, Ondřej Bojar
Abstract	It is not uncommon for Internet users to have to produce a text in a foreign language they have very little knowledge of and are unable to verify the translation quality. We call the task “outbound translation” and explore it by introducing an open-source modular system Ptakop\v{e}t. Its main purpose is to inspect human interaction with MT systems enhanced with additional subsystems, such as backward translation and quality estimation. We follow up with an experiment on (Czech) human annotators tasked to produce questions in a language they do not speak (German), with the help of Ptakop\v{e}t. We focus on three real-world use cases (communication with IT support, describing administrative issues and asking encyclopedic questions) from which we gain insight into different strategies users take when faced with outbound translation tasks. Round trip translation is known to be unreliable for evaluating MT systems but our experimental evaluation documents that it works very well for users, at least on MT systems of mid-range quality.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10835v2
PDF	https://arxiv.org/pdf/1911.10835v2.pdf
PWC	https://paperswithcode.com/paper/outbound-translation-user-interface-ptakopet
Repo
Framework

DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks


Title	DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
Authors	Simon Wiedemann, Heiner Kirchoffer, Stefan Matlage, Paul Haase, Arturo Marban, Talmaj Marinc, David Neumann, Tung Nguyen, Ahmed Osman, Detlev Marpe, Heiko Schwarz, Thomas Wiegand, Wojciech Samek
Abstract	The field of video compression has developed some of the most sophisticated and efficient compression algorithms known in the literature, enabling very high compressibility for little loss of information. Whilst some of these techniques are domain specific, many of their underlying principles are universal in that they can be adapted and applied for compressing different types of data. In this work we present DeepCABAC, a compression algorithm for deep neural networks that is based on one of the state-of-the-art video coding techniques. Concretely, it applies a Context-based Adaptive Binary Arithmetic Coder (CABAC) to the network’s parameters, which was originally designed for the H.264/AVC video coding standard and became the state-of-the-art for lossless compression. Moreover, DeepCABAC employs a novel quantization scheme that minimizes the rate-distortion function while simultaneously taking the impact of quantization onto the accuracy of the network into account. Experimental results show that DeepCABAC consistently attains higher compression rates than previously proposed coding techniques for neural network compression. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB. The source code for encoding and decoding can be found at https://github.com/fraunhoferhhi/DeepCABAC.
Tasks	Neural Network Compression, Quantization, Video Compression
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11900v1
PDF	https://arxiv.org/pdf/1907.11900v1.pdf
PWC	https://paperswithcode.com/paper/deepcabac-a-universal-compression-algorithm
Repo
Framework

Geometrization of deep networks for the interpretability of deep learning systems


Title	Geometrization of deep networks for the interpretability of deep learning systems
Authors	Xiao Dong, Ling Zhou
Abstract	How to understand deep learning systems remains an open problem. In this paper we propose that the answer may lie in the geometrization of deep networks. Geometrization is a bridge to connect physics, geometry, deep network and quantum computation and this may result in a new scheme to reveal the rule of the physical world. By comparing the geometry of image matching and deep networks, we show that geometrization of deep networks can be used to understand existing deep learning systems and it may also help to solve the interpretability problem of deep learning systems.
Tasks
Published	2019-01-06
URL	http://arxiv.org/abs/1901.02354v2
PDF	http://arxiv.org/pdf/1901.02354v2.pdf
PWC	https://paperswithcode.com/paper/geometrization-of-deep-networks-for-the
Repo
Framework

Learn to Explain Efficiently via Neural Logic Inductive Learning


Title	Learn to Explain Efficiently via Neural Logic Inductive Learning
Authors	Yuan Yang, Le Song
Abstract	The capability of making interpretable and self-explanatory decisions is essential for developing responsible machine learning systems. In this work, we study the learning to explain problem in the scope of inductive logic programming (ILP). We propose Neural Logic Inductive Learning (NLIL), an efficient differentiable ILP framework that learns first-order logic rules that can explain the patterns in the data. In experiments, compared with the state-of-the-art methods, we find NLIL can search for rules that are x10 times longer while remaining x3 times faster. We also show that NLIL can scale to large image datasets, i.e. Visual Genome, with 1M entities.
Tasks
Published	2019-10-06
URL	https://arxiv.org/abs/1910.02481v3
PDF	https://arxiv.org/pdf/1910.02481v3.pdf
PWC	https://paperswithcode.com/paper/learn-to-explain-efficiently-via-neural-logic
Repo
Framework

Robust Subspace Recovery with Adversarial Outliers


Title	Robust Subspace Recovery with Adversarial Outliers
Authors	Tyler Maunu, Gilad Lerman
Abstract	We study the problem of robust subspace recovery (RSR) in the presence of adversarial outliers. That is, we seek a subspace that contains a large portion of a dataset when some fraction of the data points are arbitrarily corrupted. We first examine a theoretical estimator that is intractable to calculate and use it to derive information-theoretic bounds of exact recovery. We then propose two tractable estimators: a variant of RANSAC and a simple relaxation of the theoretical estimator. The two estimators are fast to compute and achieve state-of-the-art theoretical performance in a noiseless RSR setting with adversarial outliers. The former estimator achieves better theoretical guarantees in the noiseless case, while the latter estimator is robust to small noise, and its guarantees significantly improve with non-adversarial models of outliers. We give a complete comparison of guarantees for the adversarial RSR problem, as well as a short discussion on the estimation of affine subspaces.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03275v1
PDF	http://arxiv.org/pdf/1904.03275v1.pdf
PWC	https://paperswithcode.com/paper/robust-subspace-recovery-with-adversarial
Repo
Framework

Classifying Diagrams and Their Parts using Graph Neural Networks: A Comparison of Crowd-Sourced and Expert Annotations


Title	Classifying Diagrams and Their Parts using Graph Neural Networks: A Comparison of Crowd-Sourced and Expert Annotations
Authors	Tuomo Hiippala
Abstract	This article compares two multimodal resources that consist of diagrams which describe topics in elementary school natural sciences. Both resources contain the same diagrams and represent their structure using graphs, but differ in terms of their annotation schema and how the annotations have been created - depending on the resource in question - either by crowd-sourced workers or trained experts. This article reports on two experiments that evaluate how effectively crowd-sourced and expert-annotated graphs can represent the multimodal structure of diagrams for representation learning using various graph neural networks. The results show that the identity of diagram elements can be learned from their layout features, while the expert annotations provide better representations of diagram types.
Tasks	Representation Learning
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02866v1
PDF	https://arxiv.org/pdf/1912.02866v1.pdf
PWC	https://paperswithcode.com/paper/classifying-diagrams-and-their-parts-using
Repo
Framework

Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding


Title	Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding
Authors	Arka Ujjal Dey, Suman Kumar Ghosh, Ernest Valveny, Gaurav Harit
Abstract	Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We do not only extract and encode visual and scene text cues, but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images, with scene text content, to demonstrate its effectiveness. In the retrieval framework, we augment our learned text-visual semantic representation with scene text cues, to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous recognition of scene text, we also apply query-based attention to our text channel. We show how the multi-channel approach, involving visual semantics and scene text, improves upon state of the art.
Tasks
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10622v3
PDF	https://arxiv.org/pdf/1905.10622v3.pdf
PWC	https://paperswithcode.com/paper/beyond-visual-semantics-exploring-the-role-of
Repo
Framework

Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference


Title	Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference
Authors	Kelvin Hsu, Fabio Ramos
Abstract	In likelihood-free settings where likelihood evaluations are intractable, approximate Bayesian computation (ABC) addresses the formidable inference task to discover plausible parameters of simulation programs that explain the observations. However, they demand large quantities of simulation calls. Critically, hyperparameters that determine measures of simulation discrepancy crucially balance inference accuracy and sample efficiency, yet are difficult to tune. In this paper, we present kernel embedding likelihood-free inference (KELFI), a holistic framework that automatically learns model hyperparameters to improve inference accuracy given limited simulation budget. By leveraging likelihood smoothness with conditional mean embeddings, we nonparametrically approximate likelihoods and posteriors as surrogate densities and sample from closed-form posterior mean embeddings, whose hyperparameters are learned under its approximate marginal likelihood. Our modular framework demonstrates improved accuracy and efficiency on challenging inference problems in ecology.
Tasks
Published	2019-03-03
URL	http://arxiv.org/abs/1903.00863v1
PDF	http://arxiv.org/pdf/1903.00863v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-learning-of-conditional-kernel-mean
Repo
Framework

3D Guided Fine-Grained Face Manipulation


Title	3D Guided Fine-Grained Face Manipulation
Authors	Zhenglin Geng, Chen Cao, Sergey Tulyakov
Abstract	We present a method for fine-grained face manipulation. Given a face image with an arbitrary expression, our method can synthesize another arbitrary expression by the same person. This is achieved by first fitting a 3D face model and then disentangling the face into a texture and a shape. We then learn different networks in these two spaces. In the texture space, we use a conditional generative network to change the appearance, and carefully design input formats and loss functions to achieve the best results. In the shape space, we use a fully connected network to predict the accurate shapes and use the available depth data for supervision. Both networks are conditioned on expression coefficients rather than discrete labels, allowing us to generate an unlimited amount of expressions. We show the superiority of this disentangling approach through both quantitative and qualitative studies. In a user study, our method is preferred in 85% of cases when compared to the most recent work. When compared to the ground truth, annotators cannot reliably distinguish between our synthesized images and real images, preferring our method in 53% of the cases.
Tasks
Published	2019-02-24
URL	http://arxiv.org/abs/1902.08900v1
PDF	http://arxiv.org/pdf/1902.08900v1.pdf
PWC	https://paperswithcode.com/paper/3d-guided-fine-grained-face-manipulation
Repo
Framework

Adaptive Rates for Image Denoising


Title	Adaptive Rates for Image Denoising
Authors	Francesco Ortelli, Sara van de Geer
Abstract	We study the theoretical properties of image denoising via total variation penalized least-squares. We define the total vatiation in terms of the two-dimensional total discrete derivative of the image and show that it gives rise to denoised images which are piecewise constant on rectangular sets. We prove that, if the true image is piecewise constant on just a few rectangular sets, the denoised image converges to the true image at a parametric rate, up to a log term. More generally, we show that the denoised image enjoys oracle properties, that is, it is almost as good as if some aspects of the true image were known. In other words, image denoising with total variation regularization leads to an adaptive reconstruction.
Tasks	Denoising, Image Denoising
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07231v3
PDF	https://arxiv.org/pdf/1911.07231v3.pdf
PWC	https://paperswithcode.com/paper/oracle-inequalities-for-image-denoising-with
Repo
Framework

Self-Supervised Learning via Conditional Motion Propagation


Title	Self-Supervised Learning via Conditional Motion Propagation
Authors	Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy
Abstract	Intelligent agent naturally learns from motion. Various self-supervised algorithms have leveraged motion cues to learn effective visual representations. The hurdle here is that motion is both ambiguous and complex, rendering previous works either suffer from degraded learning efficacy, or resort to strong assumptions on object motions. In this work, we design a new learning-from-motion paradigm to bridge these gaps. Instead of explicitly modeling the motion probabilities, we design the pretext task as a conditional motion propagation problem. Given an input image and several sparse flow guidance vectors on it, our framework seeks to recover the full-image motion. Compared to other alternatives, our framework has several appealing properties: (1) Using sparse flow guidance during training resolves the inherent motion ambiguity, and thus easing feature learning. (2) Solving the pretext task of conditional motion propagation encourages the emergence of kinematically-sound representations that poss greater expressive power. Extensive experiments demonstrate that our framework learns structural and coherent features; and achieves state-of-the-art self-supervision performance on several downstream tasks including semantic segmentation, instance segmentation, and human parsing. Furthermore, our framework is successfully extended to several useful applications such as semi-automatic pixel-level annotation. Project page: “http://mmlab.ie.cuhk.edu.hk/projects/CMP/".
Tasks	Human Parsing, Instance Segmentation, Semantic Segmentation
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11412v3
PDF	http://arxiv.org/pdf/1903.11412v3.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-via-conditional
Repo
Framework

A Push-Pull Layer Improves Robustness of Convolutional Neural Networks


Title	A Push-Pull Layer Improves Robustness of Convolutional Neural Networks
Authors	Nicola Strisciuglio, Manuel Lopez-Antequera, Nicolai Petkov
Abstract	We propose a new layer in Convolutional Neural Networks (CNNs) to increase their robustness to several types of noise perturbations of the input images. We call this a push-pull layer and compute its response as the combination of two half-wave rectified convolutions, with kernels of opposite polarity. It is based on a biologically-motivated non-linear model of certain neurons in the visual system that exhibit a response suppression phenomenon, known as push-pull inhibition. We validate our method by substituting the first convolutional layer of the LeNet-5 and WideResNet architectures with our push-pull layer. We train the networks on nonperturbed training images from the MNIST, CIFAR-10 and CIFAR-100 data sets, and test on images perturbed by noise that is unseen by the training process. We demonstrate that our push-pull layers contribute to a considerable improvement in robustness of classification of images perturbed by noise, while maintaining state-of-the-art performance on the original image classification task.
Tasks	Image Classification
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10208v1
PDF	http://arxiv.org/pdf/1901.10208v1.pdf
PWC	https://paperswithcode.com/paper/a-push-pull-layer-improves-robustness-of
Repo
Framework

Quantifying Infra-Marginality and Its Trade-off with Group Fairness


Title	Quantifying Infra-Marginality and Its Trade-off with Group Fairness
Authors	Arpita Biswas, Siddharth Barman, Amit Deshpande, Amit Sharma
Abstract	In critical decision-making scenarios, optimizing accuracy can lead to a biased classifier, hence past work recommends enforcing group-based fairness metrics in addition to maximizing accuracy. However, doing so exposes the classifier to another kind of bias called infra-marginality. This refers to individual-level bias where some individuals/subgroups can be worse off than under simply optimizing for accuracy. For instance, a classifier implementing race-based parity may significantly disadvantage women of the advantaged race. To quantify this bias, we propose a general notion of $\eta$-infra-marginality that can be used to evaluate the extent of this bias. We prove theoretically that, unlike other fairness metrics, infra-marginality does not have a trade-off with accuracy: high accuracy directly leads to low infra-marginality. This observation is confirmed through empirical analysis on multiple simulated and real-world datasets. Further, we find that maximizing group fairness often increases infra-marginality, suggesting the consideration of both group-level fairness and individual-level infra-marginality. However, measuring infra-marginality requires knowledge of the true distribution of individual-level outcomes correctly and explicitly. We propose a practical method to measure infra-marginality, and a simple algorithm to maximize group-wise accuracy and avoid infra-marginality.
Tasks	Decision Making
Published	2019-09-03
URL	https://arxiv.org/abs/1909.00982v1
PDF	https://arxiv.org/pdf/1909.00982v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-infra-marginality-and-its-trade
Repo
Framework

A Discriminative Learned CNN Embedding for Remote Sensing Image Scene Classification


Title	A Discriminative Learned CNN Embedding for Remote Sensing Image Scene Classification
Authors	Wen Wang, Lijun Du, Yinxing Gao, Yanzhou Su, Feng Wang, Jian Cheng
Abstract	In this work, a discriminatively learned CNN embedding is proposed for remote sensing image scene classification. Our proposed siamese network simultaneously computes the classification loss function and the metric learning loss function of the two input images. Specifically, for the classification loss, we use the standard cross-entropy loss function to predict the classes of the images. For the metric learning loss, our siamese network learns to map the intra-class and inter-class input pairs to a feature space where intra-class inputs are close and inter-class inputs are separated by a margin. Concretely, for remote sensing image scene classification, we would like to map images from the same scene to feature vectors that are close, and map images from different scenes to feature vectors that are widely separated. Experiments are conducted on three different remote sensing image datasets to evaluate the effectiveness of our proposed approach. The results demonstrate that the proposed method achieves an excellent classification performance.
Tasks	Metric Learning, Scene Classification
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12517v2
PDF	https://arxiv.org/pdf/1911.12517v2.pdf
PWC	https://paperswithcode.com/paper/a-discriminative-learned-cnn-embedding-for
Repo
Framework

Towards Multi-pose Guided Virtual Try-on Network


Title	Towards Multi-pose Guided Virtual Try-on Network
Authors	Haoye Dong, Xiaodan Liang, Bochao Wang, Hanjiang Lai, Jia Zhu, Jian Yin
Abstract	Virtual try-on system under arbitrary human poses has huge application potential, yet raises quite a lot of challenges, e.g. self-occlusions, heavy misalignment among diverse poses, and diverse clothes textures. Existing methods aim at fitting new clothes into a person can only transfer clothes on the fixed human pose, but still show unsatisfactory performances which often fail to preserve the identity, lose the texture details, and decrease the diversity of poses. In this paper, we make the first attempt towards multi-pose guided virtual try-on system, which enables transfer clothes on a person image under diverse poses. Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses. Our MG-VTON is constructed in three stages: 1) a desired human parsing map of the target image is synthesized to match both the desired pose and the desired clothes shape; 2) a deep Warping Generative Adversarial Network (Warp-GAN) warps the desired clothes appearance into the synthesized human parsing map and alleviates the misalignment problem between the input human pose and desired human pose; 3) a refinement render utilizing multi-pose composition masks recovers the texture details of clothes and removes some artifacts. Extensive experiments on well-known datasets and our newly collected largest virtual try-on benchmark demonstrate that our MG-VTON significantly outperforms all state-of-the-art methods both qualitatively and quantitatively with promising multi-pose virtual try-on performances.
Tasks	Human Parsing
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11026v1
PDF	http://arxiv.org/pdf/1902.11026v1.pdf
PWC	https://paperswithcode.com/paper/towards-multi-pose-guided-virtual-try-on
Repo
Framework