October 20, 2019

3342 words 16 mins read

Paper Group AWR 234

Paper Group AWR 234

Auto-Encoding Variational Neural Machine Translation. Unsupervised Discovery of Object Landmarks as Structural Representations. Efficient Formal Safety Analysis of Neural Networks. Extended Isolation Forest. Self-supervised learning of a facial attribute embedding from video. Improved Fusion of Visual and Language Representations by Dense Symmetric …

Auto-Encoding Variational Neural Machine Translation

Title Auto-Encoding Variational Neural Machine Translation
Authors Bryan Eikema, Wilker Aziz
Abstract We present a deep generative model of bilingual sentence pairs for machine translation. The model generates source and target sentences jointly from a shared latent representation and is parameterised by neural networks. We perform efficient training using amortised variational inference and reparameterised gradients. Additionally, we discuss the statistical implications of joint modelling and propose an efficient approximation to maximum a posteriori decoding for fast test-time predictions. We demonstrate the effectiveness of our model in three machine translation scenarios: in-domain training, mixed-domain training, and learning from a mix of gold-standard and synthetic data. Our experiments show consistently that our joint formulation outperforms conditional modelling (i.e. standard neural machine translation) in all such scenarios.
Tasks Machine Translation
Published 2018-07-27
URL https://arxiv.org/abs/1807.10564v4
PDF https://arxiv.org/pdf/1807.10564v4.pdf
PWC https://paperswithcode.com/paper/auto-encoding-variational-neural-machine
Repo https://github.com/Roxot/AEVNMT
Framework tf

Unsupervised Discovery of Object Landmarks as Structural Representations

Title Unsupervised Discovery of Object Landmarks as Structural Representations
Authors Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He, Honglak Lee
Abstract Deep neural networks can model images with rich latent representations, but they cannot naturally conceptualize structures of object categories in a human-perceptible way. This paper addresses the problem of learning object structures in an image modeling process without supervision. We propose an autoencoding formulation to discover landmarks as explicit structural representations. The encoding module outputs landmark coordinates, whose validity is ensured by constraints that reflect the necessary properties for landmarks. The decoding module takes the landmarks as a part of the learnable input representations in an end-to-end differentiable framework. Our discovered landmarks are semantically meaningful and more predictive of manually annotated landmarks than those discovered by previous methods. The coordinates of our landmarks are also complementary features to pretrained deep-neural-network representations in recognizing visual attributes. In addition, the proposed method naturally creates an unsupervised, perceptible interface to manipulate object shapes and decode images with controllable structures. The project webpage is at http://ytzhang.net/projects/lmdis-rep
Tasks Unsupervised Facial Landmark Detection
Published 2018-04-12
URL http://arxiv.org/abs/1804.04412v1
PDF http://arxiv.org/pdf/1804.04412v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-discovery-of-object-landmarks-as
Repo https://github.com/YutingZhang/lmdis-rep
Framework tf

Efficient Formal Safety Analysis of Neural Networks

Title Efficient Formal Safety Analysis of Neural Networks
Authors Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana
Abstract Neural networks are increasingly deployed in real-world safety-critical domains such as autonomous driving, aircraft collision avoidance, and malware detection. However, these networks have been shown to often mispredict on inputs with minor adversarial or even accidental perturbations. Consequences of such errors can be disastrous and even potentially fatal as shown by the recent Tesla autopilot crash. Thus, there is an urgent need for formal analysis systems that can rigorously check neural networks for violations of different safety properties such as robustness against adversarial perturbations within a certain $L$-norm of a given image. An effective safety analysis system for a neural network must be able to either ensure that a safety property is satisfied by the network or find a counterexample, i.e., an input for which the network will violate the property. Unfortunately, most existing techniques for performing such analysis struggle to scale beyond very small networks and the ones that can scale to larger networks suffer from high false positives and cannot produce concrete counterexamples in case of a property violation. In this paper, we present a new efficient approach for rigorously checking different safety properties of neural networks that significantly outperforms existing approaches by multiple orders of magnitude. Our approach can check different safety properties and find concrete counterexamples for networks that are 10$\times$ larger than the ones supported by existing analysis techniques. We believe that our approach to estimating tight output bounds of a network for a given input range can also help improve the explainability of neural networks and guide the training process of more robust neural networks.
Tasks Adversarial Attack, Adversarial Defense, Autonomous Driving, Malware Detection
Published 2018-09-19
URL http://arxiv.org/abs/1809.08098v3
PDF http://arxiv.org/pdf/1809.08098v3.pdf
PWC https://paperswithcode.com/paper/efficient-formal-safety-analysis-of-neural
Repo https://github.com/tcwangshiqi-columbia/Interval-Attack
Framework tf

Extended Isolation Forest

Title Extended Isolation Forest
Authors Sahand Hariri, Matias Carrasco Kind, Robert J. Brunner
Abstract We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this problem in detail and demonstrate the mechanism by which it occurs visually. We then propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in remedying the artifact seen in the anomaly score heat maps. We show that the robustness of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant level sets. We report AUROC and AUPRC for our synthetic datasets, along with real-world benchmark datasets. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF.
Tasks Anomaly Detection
Published 2018-11-06
URL https://arxiv.org/abs/1811.02141v2
PDF https://arxiv.org/pdf/1811.02141v2.pdf
PWC https://paperswithcode.com/paper/extended-isolation-forest
Repo https://github.com/fyumoto/EIF
Framework none

Self-supervised learning of a facial attribute embedding from video

Title Self-supervised learning of a facial attribute embedding from video
Authors Olivia Wiles, A. Sophia Koepke, Andrew Zisserman
Abstract We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time. To perform this task, we introduce a network, Facial Attributes-Net (FAb-Net), that is trained to embed multiple frames from the same video face-track into a common low-dimensional space. With this approach, we make three contributions: first, we show that the network can leverage information from multiple source frames by predicting confidence/attention masks for each frame; second, we demonstrate that using a curriculum learning regime improves the learned embedding; finally, we demonstrate that the network learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression, i.e. facial attributes, without having been supervised with any labelled data. We are comparable or superior to state-of-the-art self-supervised methods on these tasks and approach the performance of supervised methods.
Tasks Unsupervised Facial Landmark Detection
Published 2018-08-21
URL http://arxiv.org/abs/1808.06882v1
PDF http://arxiv.org/pdf/1808.06882v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-learning-of-a-facial
Repo https://github.com/oawiles/FAb-Net
Framework pytorch

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

Title Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Authors Duy-Kien Nguyen, Takayuki Okatani
Abstract A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an attention mechanism that enables dense, bi-directional interactions between the two modalities contributes to boost accuracy of prediction of answers. Specifically, we present a simple architecture that is fully symmetric between visual and language representations, in which each question word attends on image regions and each image region attends on question words. It can be stacked to form a hierarchy for multi-step interactions between an image-question pair. We show through experiments that the proposed architecture achieves a new state-of-the-art on VQA and VQA 2.0 despite its small size. We also present qualitative evaluation, demonstrating how the proposed attention mechanism can generate reasonable attention maps on images and questions, which leads to the correct answer prediction.
Tasks Visual Question Answering
Published 2018-04-03
URL http://arxiv.org/abs/1804.00775v2
PDF http://arxiv.org/pdf/1804.00775v2.pdf
PWC https://paperswithcode.com/paper/improved-fusion-of-visual-and-language
Repo https://github.com/cvlab-tohoku/Dense-CoAttention-Network
Framework pytorch

A First Look at Deep Learning Apps on Smartphones

Title A First Look at Deep Learning Apps on Smartphones
Authors Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, Xuanzhe Liu
Abstract We are in the dawn of deep learning explosion for smartphones. To bridge the gap between research and practice, we present the first empirical study on 16,500 the most popular Android apps, demystifying how smartphone apps exploit deep learning in the wild. To this end, we build a new static tool that dissects apps and analyzes their deep learning functions. Our study answers threefold questions: what are the early adopter apps of deep learning, what do they use deep learning for, and how do their deep learning models look like. Our study has strong implications for app developers, smartphone vendors, and deep learning R&D. On one hand, our findings paint a promising picture of deep learning for smartphones, showing the prosperity of mobile deep learning frameworks as well as the prosperity of apps building their cores atop deep learning. On the other hand, our findings urge optimizations on deep learning models deployed on smartphones, the protection of these models, and validation of research ideas on these models.
Tasks
Published 2018-11-08
URL https://arxiv.org/abs/1812.05448v3
PDF https://arxiv.org/pdf/1812.05448v3.pdf
PWC https://paperswithcode.com/paper/a-first-look-at-deep-learning-apps-on
Repo https://github.com/xumengwei/MobileDL
Framework tf

Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform

Title Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
Authors Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy
Abstract Despite that convolutional neural networks (CNN) have recently demonstrated high-quality reconstruction for single-image super-resolution (SR), recovering natural and realistic texture remains a challenging problem. In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Transform (SFT) layer that generates affine transformation parameters for spatial-wise feature modulation. SFT layers can be trained end-to-end together with the SR network using the same loss function. During testing, it accepts an input image of arbitrary size and generates a high-resolution image with just a single forward pass conditioned on the categorical priors. Our final results show that an SR network equipped with SFT can generate more realistic and visually pleasing textures in comparison to state-of-the-art SRGAN and EnhanceNet.
Tasks Image Super-Resolution, Semantic Segmentation, Super-Resolution
Published 2018-04-09
URL http://arxiv.org/abs/1804.02815v1
PDF http://arxiv.org/pdf/1804.02815v1.pdf
PWC https://paperswithcode.com/paper/recovering-realistic-texture-in-image-super
Repo https://github.com/xinntao/BasicSR
Framework pytorch

From Recognition to Cognition: Visual Commonsense Reasoning

Title From Recognition to Cognition: Visual Commonsense Reasoning
Authors Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi
Abstract Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people’s actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today’s vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize this task as Visual Commonsense Reasoning. Given a challenging question about an image, a machine must answer correctly and then provide a rationale justifying its answer. Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and high-quality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (~45%). To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. R2C helps narrow the gap between humans and machines (~65%); still, the challenge is far from solved, and we provide analysis that suggests avenues for future work.
Tasks Visual Commonsense Reasoning
Published 2018-11-27
URL http://arxiv.org/abs/1811.10830v2
PDF http://arxiv.org/pdf/1811.10830v2.pdf
PWC https://paperswithcode.com/paper/from-recognition-to-cognition-visual
Repo https://github.com/TheShadow29/visual-commonsense-pytorch
Framework pytorch

Understand Functionality and Dimensionality of Vector Embeddings: the Distributional Hypothesis, the Pairwise Inner Product Loss and Its Bias-Variance Trade-off

Title Understand Functionality and Dimensionality of Vector Embeddings: the Distributional Hypothesis, the Pairwise Inner Product Loss and Its Bias-Variance Trade-off
Authors Zi Yin
Abstract Vector embedding is a foundational building block of many deep learning models, especially in natural language processing. In this paper, we present a theoretical framework for understanding the effect of dimensionality on vector embeddings. We observe that the distributional hypothesis, a governing principle of statistical semantics, requires a natural unitary-invariance for vector embeddings. Motivated by the unitary-invariance observation, we propose the Pairwise Inner Product (PIP) loss, a unitary-invariant metric on the similarity between two embeddings. We demonstrate that the PIP loss captures the difference in functionality between embeddings, and that the PIP loss is tightly connect with two basic properties of vector embeddings, namely similarity and compositionality. By formulating the embedding training process as matrix factorization with noise, we reveal a fundamental bias-variance trade-off between the signal spectrum and noise power in the dimensionality selection process. This bias-variance trade-off sheds light on many empirical observations which have not been thoroughly explained, for example the existence of an optimal dimensionality. Moreover, we discover two new results about vector embeddings, namely their robustness against over-parametrization and their forward stability. The bias-variance trade-off of the PIP loss explicitly answers the fundamental open problem of dimensionality selection for vector embeddings.
Tasks
Published 2018-03-01
URL http://arxiv.org/abs/1803.00502v4
PDF http://arxiv.org/pdf/1803.00502v4.pdf
PWC https://paperswithcode.com/paper/understand-functionality-and-dimensionality
Repo https://github.com/aaaasssddf/PIP-experiments
Framework tf

Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation

Title Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation
Authors Jichao Zhang, Yezhi Shu, Songhua Xu, Gongze Cao, Fan Zhong, Xueying Qin
Abstract Recently, Image-to-Image Translation (IIT) has achieved great progress in image style transfer and semantic context manipulation for images. However, existing approaches require exhaustively labelling training data, which is labor demanding, difficult to scale up, and hard to adapt to a new domain. To overcome such a key limitation, we propose Sparsely Grouped Generative Adversarial Networks (SG-GAN) as a novel approach that can translate images in sparsely grouped datasets where only a few train samples are labelled. Using a one-input multi-output architecture, SG-GAN is well-suited for tackling multi-task learning and sparsely grouped learning tasks. The new model is able to translate images among multiple groups using only a single trained model. To experimentally validate the advantages of the new model, we apply the proposed method to tackle a series of attribute manipulation tasks for facial images as a case study. Experimental results show that SG-GAN can achieve comparable results with state-of-the-art methods on adequately labelled datasets while attaining a superior image translation quality on sparsely grouped datasets.
Tasks Image-to-Image Translation, Multi-Task Learning, Style Transfer
Published 2018-05-19
URL http://arxiv.org/abs/1805.07509v6
PDF http://arxiv.org/pdf/1805.07509v6.pdf
PWC https://paperswithcode.com/paper/sparsely-grouped-multi-task-generative
Repo https://github.com/zhangqianhui/SGGAN-tensorflow
Framework tf

Bi-Real Net: Binarizing Deep Network Towards Real-Network Performance

Title Bi-Real Net: Binarizing Deep Network Towards Real-Network Performance
Authors Zechun Liu, Wenhan Luo, Baoyuan Wu, Xin Yang, Wei Liu, Kwang-Ting Cheng
Abstract In this paper, we study 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While efficient, the lacking of representational capability and the training difficulty impede 1-bit CNNs from performing as well as real-valued networks. We propose Bi-Real net with a novel training algorithm to tackle these two challenges. To enhance the representational capability, we propagate the real-valued activations generated by each 1-bit convolution via a parameter-free shortcut. To address the training difficulty, we propose a training algorithm using a tighter approximation to the derivative of the sign function, a magnitude-aware gradient for weight updating, a better initialization method, and a two-step scheme for training a deep network. Experiments on ImageNet show that an 18-layer Bi-Real net with the proposed training algorithm achieves 56.4% top-1 classification accuracy, which is 10% higher than the state-of-the-arts (e.g., XNOR-Net) with greater memory saving and lower computational cost. Bi-Real net is also the first to scale up 1-bit CNNs to an ultra-deep network with 152 layers, and achieves 64.5% top-1 accuracy on ImageNet. A 50-layer Bi-Real net shows comparable performance to a real-valued network on the depth estimation task with only a 0.3% accuracy gap.
Tasks Depth Estimation
Published 2018-11-04
URL https://arxiv.org/abs/1811.01335v2
PDF https://arxiv.org/pdf/1811.01335v2.pdf
PWC https://paperswithcode.com/paper/bi-real-net-binarizing-deep-network-towards
Repo https://github.com/liuzechun/Bi-Real-net
Framework pytorch

Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model

Title Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
Authors Hideaki Imamura, Issei Sato, Masashi Sugiyama
Abstract While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on class priors, confusion matrices, or the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. The wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model, which can not be analyzed by existing studies. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value.
Tasks
Published 2018-02-13
URL http://arxiv.org/abs/1802.04551v2
PDF http://arxiv.org/pdf/1802.04551v2.pdf
PWC https://paperswithcode.com/paper/analysis-of-minimax-error-rate-for
Repo https://github.com/HideakiImamura/MinimaxErrorRate
Framework none

Learning unknown ODE models with Gaussian processes

Title Learning unknown ODE models with Gaussian processes
Authors Markus Heinonen, Cagatay Yildiz, Henrik Mannerström, Jukka Intosalmi, Harri Lähdesmäki
Abstract In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model’s capabilities to infer dynamics from sparse data and to simulate the system forward into future.
Tasks Gaussian Processes
Published 2018-03-12
URL http://arxiv.org/abs/1803.04303v1
PDF http://arxiv.org/pdf/1803.04303v1.pdf
PWC https://paperswithcode.com/paper/learning-unknown-ode-models-with-gaussian
Repo https://github.com/cagatayyildiz/npde
Framework tf

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Title Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
Authors Nicolas Papernot, Patrick McDaniel
Abstract Deep neural networks (DNNs) enable innovative applications of machine learning like image recognition, machine translation, or malware detection. However, deep learning is often criticized for its lack of robustness in adversarial settings (e.g., vulnerability to adversarial inputs) and general inability to rationalize its predictions. In this work, we exploit the structure of deep learning to enable new learning-based inference and decision strategies that achieve desirable properties such as robustness and interpretability. We take a first step in this direction and introduce the Deep k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations. We show the labels of these neighboring points afford confidence estimates for inputs outside the model’s training manifold, including on malicious inputs like adversarial examples–and therein provides protections against inputs that are outside the models understanding. This is because the nearest neighbors can be used to estimate the nonconformity of, i.e., the lack of support for, a prediction in the training data. The neighbors also constitute human-interpretable explanations of predictions. We evaluate the DkNN algorithm on several datasets, and show the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures.
Tasks Machine Translation, Malware Detection
Published 2018-03-13
URL http://arxiv.org/abs/1803.04765v1
PDF http://arxiv.org/pdf/1803.04765v1.pdf
PWC https://paperswithcode.com/paper/deep-k-nearest-neighbors-towards-confident
Repo https://github.com/rodgzilla/machine_learning_deep_knn
Framework none
comments powered by Disqus