Paper Group AWR 20
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees. To Trust Or Not To Trust A Classifier. AI safety via debate. Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology. SO-Net: Self-Organizing Network for Point Cloud Analysis. A Structured Variational Autoencoder for Contextual Morphol …
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
Title | BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees |
Authors | Yongjoo Park, Jingyi Qing, Xiaoyang Shen, Barzan Mozafari |
Abstract | The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows users to make error-computation tradeoffs: instead of training a model on their full data (i.e., full model), BlinkML can quickly train an approximate model with quality guarantees using a sample. The quality guarantees ensure that, with high probability, the approximate model makes the same predictions as the full model. BlinkML currently supports any ML model that relies on maximum likelihood estimation (MLE), which includes Generalized Linear Models (e.g., linear regression, logistic regression, max entropy classifier, Poisson regression) as well as PPCA (Probabilistic Principal Component Analysis). Our experiments show that BlinkML can speed up the training of large-scale ML tasks by 6.26x-629x while guaranteeing the same predictions, with 95% probability, as the full model. |
Tasks | Decision Making |
Published | 2018-12-26 |
URL | http://arxiv.org/abs/1812.10564v1 |
http://arxiv.org/pdf/1812.10564v1.pdf | |
PWC | https://paperswithcode.com/paper/blinkml-efficient-maximum-likelihood |
Repo | https://github.com/jinw18/Readings_MLDB |
Framework | none |
To Trust Or Not To Trust A Classifier
Title | To Trust Or Not To Trust A Classifier |
Authors | Heinrich Jiang, Been Kim, Melody Y. Guan, Maya Gupta |
Abstract | Knowing when a classifier’s prediction can be trusted is useful in many applications and critical for safely using AI. While the bulk of the effort in machine learning research has been towards improving classifier performance, understanding when a classifier’s predictions should and should not be trusted has received far less attention. The standard approach is to use the classifier’s discriminant or confidence score; however, we show there exists an alternative that is more effective in many situations. We propose a new score, called the trust score, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier’s confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayes-optimal classifier. Our guarantees consist of non-asymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis. |
Tasks | Topological Data Analysis |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11783v2 |
http://arxiv.org/pdf/1805.11783v2.pdf | |
PWC | https://paperswithcode.com/paper/to-trust-or-not-to-trust-a-classifier |
Repo | https://github.com/SeldonIO/alibi |
Framework | tf |
AI safety via debate
Title | AI safety via debate |
Authors | Geoffrey Irving, Paul Christiano, Dario Amodei |
Abstract | To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information. In an analogy to complexity theory, debate with optimal play can answer any question in PSPACE given polynomial time judges (direct judging answers only NP questions). In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment. We report results on an initial MNIST experiment where agents compete to convince a sparse classifier, boosting the classifier’s accuracy from 59.4% to 88.9% given 6 pixels and from 48.2% to 85.2% given 4 pixels. Finally, we discuss theoretical and practical aspects of the debate model, focusing on potential weaknesses as the model scales up, and we propose future human and computer experiments to test these properties. |
Tasks | |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00899v2 |
http://arxiv.org/pdf/1805.00899v2.pdf | |
PWC | https://paperswithcode.com/paper/ai-safety-via-debate |
Repo | https://github.com/jvmancuso/safe-debates |
Framework | pytorch |
Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology
Title | Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology |
Authors | Bastian Rieck, Matteo Togninalli, Christian Bock, Michael Moor, Max Horn, Thomas Gumbsch, Karsten Borgwardt |
Abstract | While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been developed. In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs. To demonstrate the usefulness of our approach, we show that neural persistence reflects best practices developed in the deep learning community such as dropout and batch normalization. Moreover, we derive a neural persistence-based stopping criterion that shortens the training process while achieving comparable accuracies as early stopping based on validation loss. |
Tasks | Topological Data Analysis |
Published | 2018-12-23 |
URL | https://arxiv.org/abs/1812.09764v3 |
https://arxiv.org/pdf/1812.09764v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-persistence-a-complexity-measure-for |
Repo | https://github.com/BorgwardtLab/Neural-Persistence |
Framework | tf |
SO-Net: Self-Organizing Network for Point Cloud Analysis
Title | SO-Net: Self-Organizing Network for Point Cloud Analysis |
Authors | Jiaxin Li, Ben M. Chen, Gim Hee Lee |
Abstract | This paper presents SO-Net, a permutation invariant architecture for deep learning with orderless point clouds. The SO-Net models the spatial distribution of point cloud by building a Self-Organizing Map (SOM). Based on the SOM, SO-Net performs hierarchical feature extraction on individual points and SOM nodes, and ultimately represents the input point cloud by a single feature vector. The receptive field of the network can be systematically adjusted by conducting point-to-node k nearest neighbor search. In recognition tasks such as point cloud reconstruction, classification, object part segmentation and shape retrieval, our proposed network demonstrates performance that is similar with or better than state-of-the-art approaches. In addition, the training speed is significantly faster than existing point cloud recognition networks because of the parallelizability and simplicity of the proposed architecture. Our code is available at the project website. https://github.com/lijx10/SO-Net |
Tasks | |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04249v4 |
http://arxiv.org/pdf/1803.04249v4.pdf | |
PWC | https://paperswithcode.com/paper/so-net-self-organizing-network-for-point |
Repo | https://github.com/donnyruixu/pc-elm-ae |
Framework | pytorch |
A Structured Variational Autoencoder for Contextual Morphological Inflection
Title | A Structured Variational Autoencoder for Contextual Morphological Inflection |
Authors | Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan Cotterell |
Abstract | Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases. |
Tasks | Morphological Inflection |
Published | 2018-06-10 |
URL | https://arxiv.org/abs/1806.03746v2 |
https://arxiv.org/pdf/1806.03746v2.pdf | |
PWC | https://paperswithcode.com/paper/a-structured-variational-autoencoder-for |
Repo | https://github.com/LeenaShekhar/NLP-Linguistics-ML-Resources |
Framework | tf |
AMNet: Memorability Estimation with Attention
Title | AMNet: Memorability Estimation with Attention |
Authors | Jiri Fajtl, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino |
Abstract | In this paper we present the design and evaluation of an end-to-end trainable, deep neural network with a visual attention mechanism for memorability estimation in still images. We analyze the suitability of transfer learning of deep models from image classification to the memorability task. Further on we study the impact of the attention mechanism on the memorability estimation and evaluate our network on the SUN Memorability and the LaMem datasets. Our network outperforms the existing state of the art models on both datasets in terms of the Spearman’s rank correlation as well as the mean squared error, closely matching human consistency. |
Tasks | Image Classification, Transfer Learning |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03115v1 |
http://arxiv.org/pdf/1804.03115v1.pdf | |
PWC | https://paperswithcode.com/paper/amnet-memorability-estimation-with-attention |
Repo | https://github.com/ok1zjf/amnet |
Framework | pytorch |
Synaptic Plasticity Dynamics for Deep Continuous Local Learning (DECOLLE)
Title | Synaptic Plasticity Dynamics for Deep Continuous Local Learning (DECOLLE) |
Authors | Jacques Kaiser, Hesham Mostafa, Emre Neftci |
Abstract | A growing body of work underlines striking similarities between biological neural networks and recurrent, binary neural networks. A relatively smaller body of work, however, discusses similarities between learning dynamics employed in deep artificial neural networks and synaptic plasticity in spiking neural networks. The challenge preventing this is largely caused by the discrepancy between the dynamical properties of synaptic plasticity and the requirements for gradient backpropagation. Learning algorithms that approximate gradient backpropagation using locally synthesized gradients can overcome this challenge. Here, we show that synthetic gradients enable the derivation of Deep Continuous Local Learning (DECOLLE) in spiking neural networks. DECOLLE is capable of learning deep spatio-temporal representations from spikes relying solely on local information. Synaptic plasticity rules are derived systematically from user-defined cost functions and neural dynamics by leveraging existing autodifferentiation methods of machine learning frameworks. We benchmark our approach on the MNIST and the event-based neuromorphic DvsGesture dataset, on which DECOLLE performs comparably to the state-of-the-art. DECOLLE networks provide continuously learning machines that are relevant to biology and supportive of event-based, low-power computer vision architectures matching the accuracies of conventional computers on tasks where temporal precision and speed are essential. |
Tasks | |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.10766v3 |
https://arxiv.org/pdf/1811.10766v3.pdf | |
PWC | https://paperswithcode.com/paper/synaptic-plasticity-dynamics-for-deep |
Repo | https://github.com/nmi-lab/dcll |
Framework | pytorch |
Class-Distinct and Class-Mutual Image Generation with GANs
Title | Class-Distinct and Class-Mutual Image Generation with GANs |
Authors | Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada |
Abstract | Class-conditional extensions of generative adversarial networks (GANs), such as auxiliary classifier GAN (AC-GAN) and conditional GAN (cGAN), have garnered attention owing to their ability to decompose representations into class labels and other factors and to boost the training stability. However, a limitation is that they assume that each class is separable and ignore the relationship between classes even though class overlapping frequently occurs in a real-world scenario when data are collected on the basis of diverse or ambiguous criteria. To overcome this limitation, we address a novel problem called class-distinct and class-mutual image generation, in which the goal is to construct a generator that can capture between-class relationships and generate an image selectively conditioned on the class specificity. To solve this problem without additional supervision, we propose classifier’s posterior GAN (CP-GAN), in which we redesign the generator input and the objective function of AC-GAN for class-overlapping data. Precisely, we incorporate the classifier’s posterior into the generator input and optimize the generator so that the classifier’s posterior of generated data corresponds with that of real data. We demonstrate the effectiveness of CP-GAN using both controlled and real-world class-overlapping data with a model configuration analysis and comparative study. Our code is available at https://github.com/takuhirok/CP-GAN/. |
Tasks | Conditional Image Generation, Image Generation, Image-to-Image Translation |
Published | 2018-11-27 |
URL | https://arxiv.org/abs/1811.11163v2 |
https://arxiv.org/pdf/1811.11163v2.pdf | |
PWC | https://paperswithcode.com/paper/class-distinct-and-class-mutual-image |
Repo | https://github.com/takuhirok/NR-GAN |
Framework | pytorch |
Diverse Conditional Image Generation by Stochastic Regression with Latent Drop-Out Codes
Title | Diverse Conditional Image Generation by Stochastic Regression with Latent Drop-Out Codes |
Authors | Yang He, Bernt Schiele, Mario Fritz |
Abstract | Recent advances in Deep Learning and probabilistic modeling have led to strong improvements in generative models for images. On the one hand, Generative Adversarial Networks (GANs) have contributed a highly effective adversarial learning procedure, but still suffer from stability issues. On the other hand, Conditional Variational Auto-Encoders (CVAE) models provide a sound way of conditional modeling but suffer from mode-mixing issues. Therefore, recent work has turned back to simple and stable regression models that are effective at generation but give up on the sampling mechanism and the latent code representation. We propose a novel and efficient stochastic regression approach with latent drop-out codes that combines the merits of both lines of research. In addition, a new training objective increases coverage of the training distribution leading to improvements over the state of the art in terms of accuracy as well as diversity. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2018-08-03 |
URL | http://arxiv.org/abs/1808.01121v1 |
http://arxiv.org/pdf/1808.01121v1.pdf | |
PWC | https://paperswithcode.com/paper/diverse-conditional-image-generation-by |
Repo | https://github.com/SSAW14/Image_Generation_with_Latent_Code |
Framework | caffe2 |
Unprocessing Images for Learned Raw Denoising
Title | Unprocessing Images for Learned Raw Denoising |
Authors | Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, Jonathan T. Barron |
Abstract | Machine learning techniques work best when the data used for training resembles the data used for evaluation. This holds true for learned single-image denoising algorithms, which are applied to real raw camera sensor readings but, due to practical constraints, are often trained on synthetic image data. Though it is understood that generalizing from synthetic to real data requires careful consideration of the noise properties of image sensors, the other aspects of a camera’s image processing pipeline (gain, color correction, tone mapping, etc) are often overlooked, despite their significant effect on how raw measurements are transformed into finished images. To address this, we present a technique to “unprocess” images by inverting each step of an image processing pipeline, thereby allowing us to synthesize realistic raw sensor measurements from commonly available internet photos. We additionally model the relevant components of an image processing pipeline when evaluating our loss function, which allows training to be aware of all relevant photometric processing that will occur after denoising. By processing and unprocessing model outputs and training data in this way, we are able to train a simple convolutional neural network that has 14%-38% lower error rates and is 9x-18x faster than the previous state of the art on the Darmstadt Noise Dataset, and generalizes to sensors outside of that dataset as well. |
Tasks | Denoising, Image Denoising |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11127v1 |
http://arxiv.org/pdf/1811.11127v1.pdf | |
PWC | https://paperswithcode.com/paper/unprocessing-images-for-learned-raw-denoising |
Repo | https://github.com/google-research/google-research/tree/master/unprocessing |
Framework | tf |
Joint Deformable Registration of Large EM Image Volumes: A Matrix Solver Approach
Title | Joint Deformable Registration of Large EM Image Volumes: A Matrix Solver Approach |
Authors | Khaled Khairy, Gennady Denisov, Stephan Saalfeld |
Abstract | Large electron microscopy image datasets for connectomics are typically composed of thousands to millions of partially overlapping two-dimensional images (tiles), which must be registered into a coherent volume prior to further analysis. A common registration strategy is to find matching features between neighboring and overlapping image pairs, followed by a numerical estimation of optimal image deformation using a so-called solver program. Existing solvers are inadequate for large data volumes, and inefficient for small-scale image registration. In this work, an efficient and accurate matrix-based solver method is presented. A linear system is constructed that combines minimization of feature-pair square distances with explicit constraints in a regularization term. In absence of reliable priors for regularization, we show how to construct a rigid-model approximation to use as prior. The linear system is solved using available computer programs, whose performance on typical registration tasks we briefly compare, and to which future scale-up is delegated. Our method is applied to the joint alignment of 2.67 million images, with more than 200 million point-pairs and has been used for successfully aligning the first full adult fruit fly brain. |
Tasks | Image Registration |
Published | 2018-04-26 |
URL | http://arxiv.org/abs/1804.10019v1 |
http://arxiv.org/pdf/1804.10019v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-deformable-registration-of-large-em |
Repo | https://github.com/khaledkhairy/EM_aligner |
Framework | none |
Representation Learning with Contrastive Predictive Coding
Title | Representation Learning with Contrastive Predictive Coding |
Authors | Aaron van den Oord, Yazhe Li, Oriol Vinyals |
Abstract | While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. |
Tasks | Representation Learning, Self-Supervised Image Classification, Semi-Supervised Image Classification |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03748v2 |
http://arxiv.org/pdf/1807.03748v2.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-with-contrastive |
Repo | https://github.com/jefflai108/Contrastive-Predictive-Coding-PyTorch |
Framework | pytorch |
Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries
Title | Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries |
Authors | Junjie Hu, Mete Ozay, Yan Zhang, Takayuki Okatani |
Abstract | This paper considers the problem of single image depth estimation. The employment of convolutional neural networks (CNNs) has recently brought about significant advancements in the research of this problem. However, most existing methods suffer from loss of spatial resolution in the estimated depth maps; a typical symptom is distorted and blurry reconstruction of object boundaries. In this paper, toward more accurate estimation with a focus on depth maps with higher spatial resolution, we propose two improvements to existing approaches. One is about the strategy of fusing features extracted at different scales, for which we propose an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module. The other is about loss functions for measuring inference errors used in training. We show that three loss terms, which measure errors in depth, gradients and surface normals, respectively, contribute to improvement of accuracy in an complementary fashion. Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08673v2 |
http://arxiv.org/pdf/1803.08673v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-single-image-depth-estimation |
Repo | https://github.com/Xt-Chen/SARPN |
Framework | pytorch |
SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering
Title | SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering |
Authors | Chenguang Zhu, Michael Zeng, Xuedong Huang |
Abstract | Conversational question answering (CQA) is a novel QA task that requires understanding of dialogue context. Different from traditional single-turn machine reading comprehension (MRC) tasks, CQA includes passage comprehension, coreference resolution, and contextual understanding. In this paper, we propose an innovated contextualized attention-based deep neural network, SDNet, to fuse context into traditional MRC models. Our model leverages both inter-attention and self-attention to comprehend conversation context and extract relevant information from passage. Furthermore, we demonstrated a novel method to integrate the latest BERT contextual model. Empirical results show the effectiveness of our model, which sets the new state of the art result in CoQA leaderboard, outperforming the previous best model by 1.6% F1. Our ensemble model further improves the result by 2.7% F1. |
Tasks | Coreference Resolution, Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03593v5 |
http://arxiv.org/pdf/1812.03593v5.pdf | |
PWC | https://paperswithcode.com/paper/sdnet-contextualized-attention-based-deep |
Repo | https://github.com/mpandeydev/SDnetmod |
Framework | pytorch |