Paper Group AWR 234
Auto-Encoding Variational Neural Machine Translation. Unsupervised Discovery of Object Landmarks as Structural Representations. Efficient Formal Safety Analysis of Neural Networks. Extended Isolation Forest. Self-supervised learning of a facial attribute embedding from video. Improved Fusion of Visual and Language Representations by Dense Symmetric …
Auto-Encoding Variational Neural Machine Translation
Title | Auto-Encoding Variational Neural Machine Translation |
Authors | Bryan Eikema, Wilker Aziz |
Abstract | We present a deep generative model of bilingual sentence pairs for machine translation. The model generates source and target sentences jointly from a shared latent representation and is parameterised by neural networks. We perform efficient training using amortised variational inference and reparameterised gradients. Additionally, we discuss the statistical implications of joint modelling and propose an efficient approximation to maximum a posteriori decoding for fast test-time predictions. We demonstrate the effectiveness of our model in three machine translation scenarios: in-domain training, mixed-domain training, and learning from a mix of gold-standard and synthetic data. Our experiments show consistently that our joint formulation outperforms conditional modelling (i.e. standard neural machine translation) in all such scenarios. |
Tasks | Machine Translation |
Published | 2018-07-27 |
URL | https://arxiv.org/abs/1807.10564v4 |
https://arxiv.org/pdf/1807.10564v4.pdf | |
PWC | https://paperswithcode.com/paper/auto-encoding-variational-neural-machine |
Repo | https://github.com/Roxot/AEVNMT |
Framework | tf |
Unsupervised Discovery of Object Landmarks as Structural Representations
Title | Unsupervised Discovery of Object Landmarks as Structural Representations |
Authors | Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He, Honglak Lee |
Abstract | Deep neural networks can model images with rich latent representations, but they cannot naturally conceptualize structures of object categories in a human-perceptible way. This paper addresses the problem of learning object structures in an image modeling process without supervision. We propose an autoencoding formulation to discover landmarks as explicit structural representations. The encoding module outputs landmark coordinates, whose validity is ensured by constraints that reflect the necessary properties for landmarks. The decoding module takes the landmarks as a part of the learnable input representations in an end-to-end differentiable framework. Our discovered landmarks are semantically meaningful and more predictive of manually annotated landmarks than those discovered by previous methods. The coordinates of our landmarks are also complementary features to pretrained deep-neural-network representations in recognizing visual attributes. In addition, the proposed method naturally creates an unsupervised, perceptible interface to manipulate object shapes and decode images with controllable structures. The project webpage is at http://ytzhang.net/projects/lmdis-rep |
Tasks | Unsupervised Facial Landmark Detection |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04412v1 |
http://arxiv.org/pdf/1804.04412v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-discovery-of-object-landmarks-as |
Repo | https://github.com/YutingZhang/lmdis-rep |
Framework | tf |
Efficient Formal Safety Analysis of Neural Networks
Title | Efficient Formal Safety Analysis of Neural Networks |
Authors | Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana |
Abstract | Neural networks are increasingly deployed in real-world safety-critical domains such as autonomous driving, aircraft collision avoidance, and malware detection. However, these networks have been shown to often mispredict on inputs with minor adversarial or even accidental perturbations. Consequences of such errors can be disastrous and even potentially fatal as shown by the recent Tesla autopilot crash. Thus, there is an urgent need for formal analysis systems that can rigorously check neural networks for violations of different safety properties such as robustness against adversarial perturbations within a certain $L$-norm of a given image. An effective safety analysis system for a neural network must be able to either ensure that a safety property is satisfied by the network or find a counterexample, i.e., an input for which the network will violate the property. Unfortunately, most existing techniques for performing such analysis struggle to scale beyond very small networks and the ones that can scale to larger networks suffer from high false positives and cannot produce concrete counterexamples in case of a property violation. In this paper, we present a new efficient approach for rigorously checking different safety properties of neural networks that significantly outperforms existing approaches by multiple orders of magnitude. Our approach can check different safety properties and find concrete counterexamples for networks that are 10$\times$ larger than the ones supported by existing analysis techniques. We believe that our approach to estimating tight output bounds of a network for a given input range can also help improve the explainability of neural networks and guide the training process of more robust neural networks. |
Tasks | Adversarial Attack, Adversarial Defense, Autonomous Driving, Malware Detection |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.08098v3 |
http://arxiv.org/pdf/1809.08098v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-formal-safety-analysis-of-neural |
Repo | https://github.com/tcwangshiqi-columbia/Interval-Attack |
Framework | tf |
Extended Isolation Forest
Title | Extended Isolation Forest |
Authors | Sahand Hariri, Matias Carrasco Kind, Robert J. Brunner |
Abstract | We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this problem in detail and demonstrate the mechanism by which it occurs visually. We then propose two different approaches for improving the situation. First we propose transforming the data randomly before creation of each tree, which results in averaging out the bias. Second, which is the preferred way, is to allow the slicing of the data to use hyperplanes with random slopes. This approach results in remedying the artifact seen in the anomaly score heat maps. We show that the robustness of the algorithm is much improved using this method by looking at the variance of scores of data points distributed along constant level sets. We report AUROC and AUPRC for our synthetic datasets, along with real-world benchmark datasets. We find no appreciable difference in the rate of convergence nor in computation time between the standard Isolation Forest and EIF. |
Tasks | Anomaly Detection |
Published | 2018-11-06 |
URL | https://arxiv.org/abs/1811.02141v2 |
https://arxiv.org/pdf/1811.02141v2.pdf | |
PWC | https://paperswithcode.com/paper/extended-isolation-forest |
Repo | https://github.com/fyumoto/EIF |
Framework | none |
Self-supervised learning of a facial attribute embedding from video
Title | Self-supervised learning of a facial attribute embedding from video |
Authors | Olivia Wiles, A. Sophia Koepke, Andrew Zisserman |
Abstract | We propose a self-supervised framework for learning facial attributes by simply watching videos of a human face speaking, laughing, and moving over time. To perform this task, we introduce a network, Facial Attributes-Net (FAb-Net), that is trained to embed multiple frames from the same video face-track into a common low-dimensional space. With this approach, we make three contributions: first, we show that the network can leverage information from multiple source frames by predicting confidence/attention masks for each frame; second, we demonstrate that using a curriculum learning regime improves the learned embedding; finally, we demonstrate that the network learns a meaningful face embedding that encodes information about head pose, facial landmarks and facial expression, i.e. facial attributes, without having been supervised with any labelled data. We are comparable or superior to state-of-the-art self-supervised methods on these tasks and approach the performance of supervised methods. |
Tasks | Unsupervised Facial Landmark Detection |
Published | 2018-08-21 |
URL | http://arxiv.org/abs/1808.06882v1 |
http://arxiv.org/pdf/1808.06882v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-learning-of-a-facial |
Repo | https://github.com/oawiles/FAb-Net |
Framework | pytorch |
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Title | Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering |
Authors | Duy-Kien Nguyen, Takayuki Okatani |
Abstract | A key solution to visual question answering (VQA) exists in how to fuse visual and language features extracted from an input image and question. We show that an attention mechanism that enables dense, bi-directional interactions between the two modalities contributes to boost accuracy of prediction of answers. Specifically, we present a simple architecture that is fully symmetric between visual and language representations, in which each question word attends on image regions and each image region attends on question words. It can be stacked to form a hierarchy for multi-step interactions between an image-question pair. We show through experiments that the proposed architecture achieves a new state-of-the-art on VQA and VQA 2.0 despite its small size. We also present qualitative evaluation, demonstrating how the proposed attention mechanism can generate reasonable attention maps on images and questions, which leads to the correct answer prediction. |
Tasks | Visual Question Answering |
Published | 2018-04-03 |
URL | http://arxiv.org/abs/1804.00775v2 |
http://arxiv.org/pdf/1804.00775v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-fusion-of-visual-and-language |
Repo | https://github.com/cvlab-tohoku/Dense-CoAttention-Network |
Framework | pytorch |
A First Look at Deep Learning Apps on Smartphones
Title | A First Look at Deep Learning Apps on Smartphones |
Authors | Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, Xuanzhe Liu |
Abstract | We are in the dawn of deep learning explosion for smartphones. To bridge the gap between research and practice, we present the first empirical study on 16,500 the most popular Android apps, demystifying how smartphone apps exploit deep learning in the wild. To this end, we build a new static tool that dissects apps and analyzes their deep learning functions. Our study answers threefold questions: what are the early adopter apps of deep learning, what do they use deep learning for, and how do their deep learning models look like. Our study has strong implications for app developers, smartphone vendors, and deep learning R&D. On one hand, our findings paint a promising picture of deep learning for smartphones, showing the prosperity of mobile deep learning frameworks as well as the prosperity of apps building their cores atop deep learning. On the other hand, our findings urge optimizations on deep learning models deployed on smartphones, the protection of these models, and validation of research ideas on these models. |
Tasks | |
Published | 2018-11-08 |
URL | https://arxiv.org/abs/1812.05448v3 |
https://arxiv.org/pdf/1812.05448v3.pdf | |
PWC | https://paperswithcode.com/paper/a-first-look-at-deep-learning-apps-on |
Repo | https://github.com/xumengwei/MobileDL |
Framework | tf |
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
Title | Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform |
Authors | Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy |
Abstract | Despite that convolutional neural networks (CNN) have recently demonstrated high-quality reconstruction for single-image super-resolution (SR), recovering natural and realistic texture remains a challenging problem. In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Transform (SFT) layer that generates affine transformation parameters for spatial-wise feature modulation. SFT layers can be trained end-to-end together with the SR network using the same loss function. During testing, it accepts an input image of arbitrary size and generates a high-resolution image with just a single forward pass conditioned on the categorical priors. Our final results show that an SR network equipped with SFT can generate more realistic and visually pleasing textures in comparison to state-of-the-art SRGAN and EnhanceNet. |
Tasks | Image Super-Resolution, Semantic Segmentation, Super-Resolution |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.02815v1 |
http://arxiv.org/pdf/1804.02815v1.pdf | |
PWC | https://paperswithcode.com/paper/recovering-realistic-texture-in-image-super |
Repo | https://github.com/xinntao/BasicSR |
Framework | pytorch |
From Recognition to Cognition: Visual Commonsense Reasoning
Title | From Recognition to Cognition: Visual Commonsense Reasoning |
Authors | Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi |
Abstract | Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people’s actions, goals, and mental states. While this task is easy for humans, it is tremendously difficult for today’s vision systems, requiring higher-order cognition and commonsense reasoning about the world. We formalize this task as Visual Commonsense Reasoning. Given a challenging question about an image, a machine must answer correctly and then provide a rationale justifying its answer. Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. The key recipe for generating non-trivial and high-quality problems at scale is Adversarial Matching, a new approach to transform rich annotations into multiple choice questions with minimal bias. Experimental results show that while humans find VCR easy (over 90% accuracy), state-of-the-art vision models struggle (~45%). To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. R2C helps narrow the gap between humans and machines (~65%); still, the challenge is far from solved, and we provide analysis that suggests avenues for future work. |
Tasks | Visual Commonsense Reasoning |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10830v2 |
http://arxiv.org/pdf/1811.10830v2.pdf | |
PWC | https://paperswithcode.com/paper/from-recognition-to-cognition-visual |
Repo | https://github.com/TheShadow29/visual-commonsense-pytorch |
Framework | pytorch |
Understand Functionality and Dimensionality of Vector Embeddings: the Distributional Hypothesis, the Pairwise Inner Product Loss and Its Bias-Variance Trade-off
Title | Understand Functionality and Dimensionality of Vector Embeddings: the Distributional Hypothesis, the Pairwise Inner Product Loss and Its Bias-Variance Trade-off |
Authors | Zi Yin |
Abstract | Vector embedding is a foundational building block of many deep learning models, especially in natural language processing. In this paper, we present a theoretical framework for understanding the effect of dimensionality on vector embeddings. We observe that the distributional hypothesis, a governing principle of statistical semantics, requires a natural unitary-invariance for vector embeddings. Motivated by the unitary-invariance observation, we propose the Pairwise Inner Product (PIP) loss, a unitary-invariant metric on the similarity between two embeddings. We demonstrate that the PIP loss captures the difference in functionality between embeddings, and that the PIP loss is tightly connect with two basic properties of vector embeddings, namely similarity and compositionality. By formulating the embedding training process as matrix factorization with noise, we reveal a fundamental bias-variance trade-off between the signal spectrum and noise power in the dimensionality selection process. This bias-variance trade-off sheds light on many empirical observations which have not been thoroughly explained, for example the existence of an optimal dimensionality. Moreover, we discover two new results about vector embeddings, namely their robustness against over-parametrization and their forward stability. The bias-variance trade-off of the PIP loss explicitly answers the fundamental open problem of dimensionality selection for vector embeddings. |
Tasks | |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00502v4 |
http://arxiv.org/pdf/1803.00502v4.pdf | |
PWC | https://paperswithcode.com/paper/understand-functionality-and-dimensionality |
Repo | https://github.com/aaaasssddf/PIP-experiments |
Framework | tf |
Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation
Title | Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation |
Authors | Jichao Zhang, Yezhi Shu, Songhua Xu, Gongze Cao, Fan Zhong, Xueying Qin |
Abstract | Recently, Image-to-Image Translation (IIT) has achieved great progress in image style transfer and semantic context manipulation for images. However, existing approaches require exhaustively labelling training data, which is labor demanding, difficult to scale up, and hard to adapt to a new domain. To overcome such a key limitation, we propose Sparsely Grouped Generative Adversarial Networks (SG-GAN) as a novel approach that can translate images in sparsely grouped datasets where only a few train samples are labelled. Using a one-input multi-output architecture, SG-GAN is well-suited for tackling multi-task learning and sparsely grouped learning tasks. The new model is able to translate images among multiple groups using only a single trained model. To experimentally validate the advantages of the new model, we apply the proposed method to tackle a series of attribute manipulation tasks for facial images as a case study. Experimental results show that SG-GAN can achieve comparable results with state-of-the-art methods on adequately labelled datasets while attaining a superior image translation quality on sparsely grouped datasets. |
Tasks | Image-to-Image Translation, Multi-Task Learning, Style Transfer |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07509v6 |
http://arxiv.org/pdf/1805.07509v6.pdf | |
PWC | https://paperswithcode.com/paper/sparsely-grouped-multi-task-generative |
Repo | https://github.com/zhangqianhui/SGGAN-tensorflow |
Framework | tf |
Bi-Real Net: Binarizing Deep Network Towards Real-Network Performance
Title | Bi-Real Net: Binarizing Deep Network Towards Real-Network Performance |
Authors | Zechun Liu, Wenhan Luo, Baoyuan Wu, Xin Yang, Wei Liu, Kwang-Ting Cheng |
Abstract | In this paper, we study 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While efficient, the lacking of representational capability and the training difficulty impede 1-bit CNNs from performing as well as real-valued networks. We propose Bi-Real net with a novel training algorithm to tackle these two challenges. To enhance the representational capability, we propagate the real-valued activations generated by each 1-bit convolution via a parameter-free shortcut. To address the training difficulty, we propose a training algorithm using a tighter approximation to the derivative of the sign function, a magnitude-aware gradient for weight updating, a better initialization method, and a two-step scheme for training a deep network. Experiments on ImageNet show that an 18-layer Bi-Real net with the proposed training algorithm achieves 56.4% top-1 classification accuracy, which is 10% higher than the state-of-the-arts (e.g., XNOR-Net) with greater memory saving and lower computational cost. Bi-Real net is also the first to scale up 1-bit CNNs to an ultra-deep network with 152 layers, and achieves 64.5% top-1 accuracy on ImageNet. A 50-layer Bi-Real net shows comparable performance to a real-valued network on the depth estimation task with only a 0.3% accuracy gap. |
Tasks | Depth Estimation |
Published | 2018-11-04 |
URL | https://arxiv.org/abs/1811.01335v2 |
https://arxiv.org/pdf/1811.01335v2.pdf | |
PWC | https://paperswithcode.com/paper/bi-real-net-binarizing-deep-network-towards |
Repo | https://github.com/liuzechun/Bi-Real-net |
Framework | pytorch |
Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model
Title | Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model |
Authors | Hideaki Imamura, Issei Sato, Masashi Sugiyama |
Abstract | While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on class priors, confusion matrices, or the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. The wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model, which can not be analyzed by existing studies. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04551v2 |
http://arxiv.org/pdf/1802.04551v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-minimax-error-rate-for |
Repo | https://github.com/HideakiImamura/MinimaxErrorRate |
Framework | none |
Learning unknown ODE models with Gaussian processes
Title | Learning unknown ODE models with Gaussian processes |
Authors | Markus Heinonen, Cagatay Yildiz, Henrik Mannerström, Jukka Intosalmi, Harri Lähdesmäki |
Abstract | In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model’s capabilities to infer dynamics from sparse data and to simulate the system forward into future. |
Tasks | Gaussian Processes |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04303v1 |
http://arxiv.org/pdf/1803.04303v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-unknown-ode-models-with-gaussian |
Repo | https://github.com/cagatayyildiz/npde |
Framework | tf |
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning
Title | Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning |
Authors | Nicolas Papernot, Patrick McDaniel |
Abstract | Deep neural networks (DNNs) enable innovative applications of machine learning like image recognition, machine translation, or malware detection. However, deep learning is often criticized for its lack of robustness in adversarial settings (e.g., vulnerability to adversarial inputs) and general inability to rationalize its predictions. In this work, we exploit the structure of deep learning to enable new learning-based inference and decision strategies that achieve desirable properties such as robustness and interpretability. We take a first step in this direction and introduce the Deep k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations. We show the labels of these neighboring points afford confidence estimates for inputs outside the model’s training manifold, including on malicious inputs like adversarial examples–and therein provides protections against inputs that are outside the models understanding. This is because the nearest neighbors can be used to estimate the nonconformity of, i.e., the lack of support for, a prediction in the training data. The neighbors also constitute human-interpretable explanations of predictions. We evaluate the DkNN algorithm on several datasets, and show the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures. |
Tasks | Machine Translation, Malware Detection |
Published | 2018-03-13 |
URL | http://arxiv.org/abs/1803.04765v1 |
http://arxiv.org/pdf/1803.04765v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-k-nearest-neighbors-towards-confident |
Repo | https://github.com/rodgzilla/machine_learning_deep_knn |
Framework | none |