Paper Group AWR 298
A Simple Baseline for Bayesian Uncertainty in Deep Learning. Causal Reasoning from Meta-reinforcement Learning. Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search. A Global-Local Emebdding Module for Fashion Landmark Detection. Cross-Lingual Machine Reading Comprehension. A Comprehensive guide to Bayesian Convolutional …
A Simple Baseline for Bayesian Uncertainty in Deep Learning
Title | A Simple Baseline for Bayesian Uncertainty in Deep Learning |
Authors | Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson |
Abstract | We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling. |
Tasks | Bayesian Inference, Calibration, Transfer Learning |
Published | 2019-02-07 |
URL | https://arxiv.org/abs/1902.02476v2 |
https://arxiv.org/pdf/1902.02476v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-baseline-for-bayesian-uncertainty-in |
Repo | https://github.com/SamuelGuilluy/Bayesian_ML_SWAG |
Framework | pytorch |
Causal Reasoning from Meta-reinforcement Learning
Title | Causal Reasoning from Meta-reinforcement Learning |
Authors | Ishita Dasgupta, Jane Wang, Silvia Chiappa, Jovana Mitrovic, Pedro Ortega, David Raposo, Edward Hughes, Peter Battaglia, Matthew Botvinick, Zeb Kurth-Nelson |
Abstract | Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform – and interpret – experiments. |
Tasks | |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.08162v1 |
http://arxiv.org/pdf/1901.08162v1.pdf | |
PWC | https://paperswithcode.com/paper/causal-reasoning-from-meta-reinforcement |
Repo | https://github.com/kantneel/causal-metarl |
Framework | none |
Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search
Title | Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search |
Authors | Xiangxiang Chu, Tianbao Zhou, Bo Zhang, Jixiang Li |
Abstract | Differentiable Architecture Search (DARTS) is now a widely disseminated weight-sharing neural architecture search method. However, it suffers from well-known performance collapse due to an inevitable aggregation of skip connections. In this paper, we first disclose that its root cause lies in an unfair advantage in exclusive competition. Through experiments, we show that if either of two conditions is broken, the collapse disappears. Thereby, we present a novel approach called Fair DARTS where the exclusive competition is relaxed to be collaborative. Specifically, we let each operation’s architectural weight be independent of others. Yet there is still an important issue of discretization discrepancy. We then propose a zero-one loss to push architectural weights towards zero or one, which approximates an expected multi-hot solution. Our experiments are performed on two mainstream search spaces, and we derive new state-of-the-art results on CIFAR-10 and ImageNet. Our code is available on https://github.com/xiaomi-automl/fairdarts . |
Tasks | AutoML, Neural Architecture Search |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12126v2 |
https://arxiv.org/pdf/1911.12126v2.pdf | |
PWC | https://paperswithcode.com/paper/fair-darts-eliminating-unfair-advantages-in |
Repo | https://github.com/xiaomi-automl/fairdarts |
Framework | pytorch |
A Global-Local Emebdding Module for Fashion Landmark Detection
Title | A Global-Local Emebdding Module for Fashion Landmark Detection |
Authors | Sumin Lee, Sungchan Oh, Chanho Jung, Changick Kim |
Abstract | Detecting fashion landmarks is a fundamental technique for visual clothing analysis. Due to the large variation and non-rigid deformation of clothes, localizing fashion landmarks suffers from large spatial variances across poses, scales, and styles. Therefore, understanding contextual knowledge of clothes is required for accurate landmark detection. To that end, in this paper, we propose a fashion landmark detection network with a global-local embedding module. The global-local embedding module is based on a non-local operation for capturing long-range dependencies and a subsequent convolution operation for adopting local neighborhood relations. With this processing, the network can consider both global and local contextual knowledge for a clothing image. We demonstrate that our proposed method has an excellent ability to learn advanced deep feature representations for fashion landmark detection. Experimental results on two benchmark datasets show that the proposed network outperforms the state-of-the-art methods. Our code is available at https://github.com/shumming/GLE_FLD. |
Tasks | |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10548v1 |
https://arxiv.org/pdf/1908.10548v1.pdf | |
PWC | https://paperswithcode.com/paper/a-global-local-emebdding-module-for-fashion |
Repo | https://github.com/shumming/GLE_FLD |
Framework | pytorch |
Cross-Lingual Machine Reading Comprehension
Title | Cross-Lingual Machine Reading Comprehension |
Authors | Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu |
Abstract | Though the community has made great progress on Machine Reading Comprehension (MRC) task, most of the previous works are solving English-based MRC problems, and there are few efforts on other languages mainly due to the lack of large-scale training data. In this paper, we propose Cross-Lingual Machine Reading Comprehension (CLMRC) task for the languages other than English. Firstly, we present several back-translation approaches for CLMRC task, which is straightforward to adopt. However, to accurately align the answer into another language is difficult and could introduce additional noise. In this context, we propose a novel model called Dual BERT, which takes advantage of the large-scale training data provided by rich-resource language (such as English) and learn the semantic relations between the passage and question in a bilingual context, and then utilize the learned knowledge to improve reading comprehension performance of low-resource language. We conduct experiments on two Chinese machine reading comprehension datasets CMRC 2018 and DRCD. The results show consistent and significant improvements over various state-of-the-art systems by a large margin, which demonstrate the potentials in CLMRC task. Resources available: https://github.com/ymcui/Cross-Lingual-MRC |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00361v1 |
https://arxiv.org/pdf/1909.00361v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-machine-reading-comprehension |
Repo | https://github.com/ymcui/Cross-Lingual-MRC |
Framework | tf |
A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference
Title | A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference |
Authors | Kumar Shridhar, Felix Laumann, Marcus Liwicki |
Abstract | Artificial Neural Networks are connectionist systems that perform a given task by learning on examples without having prior knowledge about the task. This is done by finding an optimal point estimate for the weights in every node. Generally, the network using point estimates as weights perform well with large datasets, but they fail to express uncertainty in regions with little or no data, leading to overconfident decisions. In this paper, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthermore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks. The results are compared to point-estimates based architectures on MNIST, CIFAR-10 and CIFAR-100 datasets for Image CLassification task, on BSD300 dataset for Image Super Resolution task and on CIFAR10 dataset again for Generative Adversarial Network task. BayesCNN is based on Bayes by Backprop which derives a variational approximation to the true posterior. We, therefore, introduce the idea of applying two convolutional operations, one for the mean and one for the variance. Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Moreover, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and empirically show how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases. Finally, we propose ways to prune the Bayesian architecture and to make it more computational and time effective. |
Tasks | Bayesian Inference, Image Classification, Image Super-Resolution, Super-Resolution |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02731v1 |
http://arxiv.org/pdf/1901.02731v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-guide-to-bayesian |
Repo | https://github.com/kumar-shridhar/PyTorch-BayesianCNN |
Framework | pytorch |
Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids
Title | Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids |
Authors | Despoina Paschalidou, Ali Osman Ulusoy, Andreas Geiger |
Abstract | Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids. |
Tasks | |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.09970v1 |
http://arxiv.org/pdf/1904.09970v1.pdf | |
PWC | https://paperswithcode.com/paper/superquadrics-revisited-learning-3d-shape |
Repo | https://github.com/paschalidoud/superquadric_parsing |
Framework | pytorch |
Efficiently Checking Actual Causality with SAT Solving
Title | Efficiently Checking Actual Causality with SAT Solving |
Authors | Amjad Ibrahim, Simon Rehwald, Alexander Pretschner |
Abstract | Recent formal approaches towards causality have made the concept ready for incorporation into the technical world. However, causality reasoning is computationally hard; and no general algorithmic approach exists that efficiently infers the causes for effects. Thus, checking causality in the context of complex, multi-agent, and distributed socio-technical systems is a significant challenge. Therefore, we conceptualize an intelligent and novel algorithmic approach towards checking causality in acyclic causal models with binary variables, utilizing the optimization power in the solvers of the Boolean Satisfiability Problem (SAT). We present two SAT encodings, and an empirical evaluation of their efficiency and scalability. We show that causality is computed efficiently in less than 5 seconds for models that consist of more than 4000 variables. |
Tasks | |
Published | 2019-04-30 |
URL | http://arxiv.org/abs/1904.13101v1 |
http://arxiv.org/pdf/1904.13101v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiently-checking-actual-causality-with |
Repo | https://github.com/amjadKhalifah/HP2SAT1.0 |
Framework | none |
Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks
Title | Density Encoding Enables Resource-Efficient Randomly Connected Neural Networks |
Authors | Denis Kleyko, Mansour Kheffache, E. Paxon Frady, Urban Wiklund, Evgeny Osipov |
Abstract | The deployment of machine learning algorithms on resource-constrained edge devices is an important challenge from both theoretical and applied points of view. In this article, we focus on resource-efficient randomly connected neural networks known as Random Vector Functional Link (RVFL) networks since their simple design and extremely fast training time make them very attractive for solving many applied classification tasks. We propose to represent input features via the density-based encoding known in the area of stochastic computing and use the operations of binding and bundling from the area of hyperdimensional computing for obtaining the activations of the hidden neurons. Using a collection of 121 real-world datasets from the UCI Machine Learning Repository, we empirically show that the proposed approach demonstrates higher average accuracy than the conventional RVFL. We also demonstrate that it is possible to represent the readout matrix using only integers in a limited range with minimal loss in the accuracy. In this case, the proposed approach operates only on small n-bits integers, which results in a computationally efficient architecture. Finally, through hardware FPGA implementations, we show that such an approach consumes approximately eleven times less energy than that of the conventional RVFL. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09153v1 |
https://arxiv.org/pdf/1909.09153v1.pdf | |
PWC | https://paperswithcode.com/paper/density-encoding-enables-resource-efficient |
Repo | https://github.com/sweetwenwen/Stochastic-computing-based-neural-network-accelerator |
Framework | none |
Connecting Vision and Language with Localized Narratives
Title | Connecting Vision and Language with Localized Narratives |
Authors | Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari |
Abstract | This paper proposes Localized Narratives, a new form of multimodal image annotations connecting vision and language. We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing. Since the voice and the mouse pointer are synchronized, we can localize every single word in the description. This dense visual grounding takes the form of a mouse trace segment per word and is unique to our data. We annotate 628k images with Localized Narratives: the whole COCO dataset and 504k images of the Open Images dataset, which we make publicly available. We provide an extensive analysis of these annotations showing they are diverse, accurate, and efficient to produce. We also demonstrate their utility on the application of controlled image captioning. |
Tasks | Image Captioning, Image Generation |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03098v3 |
https://arxiv.org/pdf/1912.03098v3.pdf | |
PWC | https://paperswithcode.com/paper/connecting-vision-and-language-with-localized |
Repo | https://github.com/google/localized-narratives |
Framework | none |
Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation
Title | Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation |
Authors | Amit Moryossef, Ido Dagan, Yoav Goldberg |
Abstract | We follow the step-by-step approach to neural data-to-text generation we proposed in Moryossef et al (2019), in which the generation process is divided into a text-planning stage followed by a plan-realization stage. We suggest four extensions to that framework: (1) we introduce a trainable neural planning component that can generate effective plans several orders of magnitude faster than the original planner; (2) we incorporate typing hints that improve the model’s ability to deal with unseen relations and entities; (3) we introduce a verification-by-reranking stage that substantially improves the faithfulness of the resulting texts; (4) we incorporate a simple but effective referring expression generation module. These extensions result in a generation process that is faster, more fluent, and more accurate. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2019-09-22 |
URL | https://arxiv.org/abs/1909.09986v1 |
https://arxiv.org/pdf/1909.09986v1.pdf | |
PWC | https://paperswithcode.com/paper/190909986 |
Repo | https://github.com/AmitMY/chimera |
Framework | none |
Real Image Denoising with Feature Attention
Title | Real Image Denoising with Feature Attention |
Authors | Saeed Anwar, Nick Barnes |
Abstract | Deep convolutional neural networks perform better on images containing spatially invariant noise (synthetic noise); however, their performance is limited on real-noisy photographs and requires multiple stage network modeling. To advance the practicability of denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture. We use a residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies. Furthermore, the evaluation in terms of quantitative metrics and visual quality on three synthetic and four real noisy datasets against 19 state-of-the-art algorithms demonstrate the superiority of our RIDNet. |
Tasks | Denoising, Image Denoising |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07396v2 |
https://arxiv.org/pdf/1904.07396v2.pdf | |
PWC | https://paperswithcode.com/paper/real-image-denoising-with-feature-attention |
Repo | https://github.com/saeed-anwar/RIDNet |
Framework | pytorch |
AI-IMU Dead-Reckoning
Title | AI-IMU Dead-Reckoning |
Authors | Martin Brossard, Axel Barrau, Silvère Bonnabel |
Abstract | In this paper we propose a novel accurate method for dead-reckoning of wheeled vehicles based only on an Inertial Measurement Unit (IMU). In the context of intelligent vehicles, robust and accurate dead-reckoning based on the IMU may prove useful to correlate feeds from imaging sensors, to safely navigate through obstructions, or for safe emergency stops in the extreme case of exteroceptive sensors failure. The key components of the method are the Kalman filter and the use of deep neural networks to dynamically adapt the noise parameters of the filter. The method is tested on the KITTI odometry dataset, and our dead-reckoning inertial method based only on the IMU accurately estimates 3D position, velocity, orientation of the vehicle and self-calibrates the IMU biases. We achieve on average a 1.10% translational error and the algorithm competes with top-ranked methods which, by contrast, use LiDAR or stereo vision. We make our implementation open-source at: https://github.com/mbrossar/ai-imu-dr |
Tasks | Dead-Reckoning Prediction |
Published | 2019-04-12 |
URL | http://arxiv.org/abs/1904.06064v1 |
http://arxiv.org/pdf/1904.06064v1.pdf | |
PWC | https://paperswithcode.com/paper/ai-imu-dead-reckoning |
Repo | https://github.com/mbrossar/ai-imu-dr |
Framework | pytorch |
Quadratization in discrete optimization and quantum mechanics
Title | Quadratization in discrete optimization and quantum mechanics |
Authors | Nike Dattani |
Abstract | A book about turning high-degree optimization problems into quadratic optimization problems that maintain the same global minimum (ground state). This book explores quadratizations for pseudo-Boolean optimization, perturbative gadgets used in QMA completeness theorems, and also non-perturbative k-local to 2-local transformations used for quantum mechanics, quantum annealing and universal adiabatic quantum computing. The book contains ~70 different Hamiltonian transformations, each of them on a separate page, where the cost (in number of auxiliary binary variables or auxiliary qubits, or number of sub-modular terms, or in graph connectivity, etc.), pros, cons, examples, and references are given. One can therefore look up a quadratization appropriate for the specific term(s) that need to be quadratized, much like using an integral table to look up the integral that needs to be done. This book is therefore useful for writing compilers to transform general optimization problems, into a form that quantum annealing or universal adiabatic quantum computing hardware requires; or for transforming quantum chemistry problems written in the Jordan-Wigner or Bravyi-Kitaev form, into a form where all multi-qubit interactions become 2-qubit pairwise interactions, without changing the desired ground state. Applications cited include computer vision problems (e.g. image de-noising, un-blurring, etc.), number theory (e.g. integer factoring), graph theory (e.g. Ramsey number determination), and quantum chemistry. The book is open source, and anyone can make modifications here: https://github.com/HPQC-LABS/Book_About_Quadratization. |
Tasks | |
Published | 2019-01-14 |
URL | https://arxiv.org/abs/1901.04405v2 |
https://arxiv.org/pdf/1901.04405v2.pdf | |
PWC | https://paperswithcode.com/paper/quadratization-in-discrete-optimization-and |
Repo | https://github.com/HPQC-LABS/Book_About_Quadratization |
Framework | none |
Hierarchical Importance Weighted Autoencoders
Title | Hierarchical Importance Weighted Autoencoders |
Authors | Chin-Wei Huang, Kris Sankaran, Eeshan Dhekane, Alexandre Lacoste, Aaron Courville |
Abstract | Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04866v1 |
https://arxiv.org/pdf/1905.04866v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-importance-weighted-autoencoders |
Repo | https://github.com/CW-Huang/HIWAE |
Framework | pytorch |