Paper Group AWR 338
Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation. Weakly Supervised Energy-Based Learning for Action Segmentation. A Biologically Inspired Visual Working Memory for Deep Networks. Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales. Well-calibrated Model Uncertainty with Temperature Scaling fo …
Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation
Title | Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation |
Authors | Cunxiang Wang, Shuailong Liang, Yue Zhang, Xiaonan Li, Tian Gao |
Abstract | Introducing common sense to natural language understanding systems has received increasing research attention. It remains a fundamental question on how to evaluate whether a system has a sense making capability. Existing benchmarks measures commonsense knowledge indirectly and without explanation. In this paper, we release a benchmark to directly test whether a system can differentiate natural language statements that make sense from those that do not make sense. In addition, a system is asked to identify the most crucial reason why a statement does not make sense. We evaluate models trained over large-scale language modeling tasks as well as human performance, showing that there are different challenges for system sense making. |
Tasks | Common Sense Reasoning, Language Modelling |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00363v1 |
https://arxiv.org/pdf/1906.00363v1.pdf | |
PWC | https://paperswithcode.com/paper/190600363 |
Repo | https://github.com/wangcunxiang/Sen-Making-and-Explanation |
Framework | tf |
Weakly Supervised Energy-Based Learning for Action Segmentation
Title | Weakly Supervised Energy-Based Learning for Action Segmentation |
Authors | Jun Li, Peng Lei, Sinisa Todorovic |
Abstract | This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in training videos are unknown. Following prior work, we use an HMM grounded on a Gated Recurrent Unit (GRU) for frame labeling. Our key contribution is a new constrained discriminative forward loss (CDFL) that we use for training the HMM and GRU under weak supervision. While prior work typically estimates the loss on a single, inferred video segmentation, our CDFL discriminates between the energy of all valid and invalid frame labelings of a training video. A valid frame labeling satisfies the ground-truth temporal ordering of actions, whereas an invalid one violates the ground truth. We specify an efficient recursive algorithm for computing the CDFL in terms of the logadd function of the segmentation energy. Our evaluation on action segmentation and alignment gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets. |
Tasks | action segmentation, Video Semantic Segmentation |
Published | 2019-09-28 |
URL | https://arxiv.org/abs/1909.13155v1 |
https://arxiv.org/pdf/1909.13155v1.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-energy-based-learning-for |
Repo | https://github.com/JunLi-Galios/CDFL |
Framework | pytorch |
A Biologically Inspired Visual Working Memory for Deep Networks
Title | A Biologically Inspired Visual Working Memory for Deep Networks |
Authors | Ethan Harris, Mahesan Niranjan, Jonathon Hare |
Abstract | The ability to look multiple times through a series of pose-adjusted glimpses is fundamental to human vision. This critical faculty allows us to understand highly complex visual scenes. Short term memory plays an integral role in aggregating the information obtained from these glimpses and informing our interpretation of the scene. Computational models have attempted to address glimpsing and visual attention but have failed to incorporate the notion of memory. We introduce a novel, biologically inspired visual working memory architecture that we term the Hebb-Rosenblatt memory. We subsequently introduce a fully differentiable Short Term Attentive Working Memory model (STAWM) which uses transformational attention to learn a memory over each image it sees. The state of our Hebb-Rosenblatt memory is embedded in STAWM as the weights space of a layer. By projecting different queries through this layer we can obtain goal-oriented latent representations for tasks including classification and visual reconstruction. Our model obtains highly competitive classification performance on MNIST and CIFAR-10. As demonstrated through the CelebA dataset, to perform reconstruction the model learns to make a sequence of updates to a canvas which constitute a parts-based representation. Classification with the self supervised representation obtained from MNIST is shown to be in line with the state of the art models (none of which use a visual attention mechanism). Finally, we show that STAWM can be trained under the dual constraints of classification and reconstruction to provide an interpretable visual sketchpad which helps open the ‘black-box’ of deep learning. |
Tasks | |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.03665v1 |
http://arxiv.org/pdf/1901.03665v1.pdf | |
PWC | https://paperswithcode.com/paper/a-biologically-inspired-visual-working-memory |
Repo | https://github.com/ethanwharris/STAWM |
Framework | pytorch |
Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales
Title | Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales |
Authors | Sanjay Thakur, Herke Van Hoof, Gunshi Gupta, David Meger |
Abstract | Neural Network based controllers hold enormous potential to learn complex, high-dimensional functions. However, they are prone to overfitting and unwarranted extrapolations. PAC Bayes is a generalized framework which is more resistant to overfitting and that yields performance bounds that hold with arbitrarily high probability even on the unjustified extrapolations. However, optimizing to learn such a function and a bound is intractable for complex tasks. In this work, we propose a method to simultaneously learn such a function and estimate performance bounds that scale organically to high-dimensions, non-linear environments without making any explicit assumptions about the environment. We build our approach on a parallel that we draw between the formulations called ELBO and PAC Bayes when the risk metric is negative log likelihood. Through our experiments on multiple high dimensional MuJoCo locomotion tasks, we validate the correctness of our theory, show its ability to generalize better, and investigate the factors that are important for its learning. The code for all the experiments is available at https://bit.ly/2qv0JjA. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10367v2 |
https://arxiv.org/pdf/1910.10367v2.pdf | |
PWC | https://paperswithcode.com/paper/unifying-variational-inference-and-pac-bayes |
Repo | https://github.com/sanjaythakur/Unifying-VI-and-PAC-Bayes-for-Learning-that-Scales |
Framework | tf |
Well-calibrated Model Uncertainty with Temperature Scaling for Dropout Variational Inference
Title | Well-calibrated Model Uncertainty with Temperature Scaling for Dropout Variational Inference |
Authors | Max-Heinrich Laves, Sontje Ihler, Karl-Philipp Kortmann, Tobias Ortmaier |
Abstract | Model uncertainty obtained by variational Bayesian inference with Monte Carlo dropout is prone to miscalibration. The uncertainty does not represent the model error well. In this paper, temperature scaling is extended to dropout variational inference to calibrate model uncertainty. Expected uncertainty calibration error (UCE) is presented as a metric to measure miscalibration of uncertainty. The effectiveness of this approach is evaluated on CIFAR-10/100 for recent CNN architectures. Experimental results show, that temperature scaling considerably reduces miscalibration by means of UCE and enables robust rejection of uncertain predictions. The proposed approach can easily be derived from frequentist temperature scaling and yields well-calibrated model uncertainty. It is simple to implement and does not affect the model accuracy. |
Tasks | Bayesian Inference, Calibration |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13550v3 |
https://arxiv.org/pdf/1909.13550v3.pdf | |
PWC | https://paperswithcode.com/paper/well-calibrated-model-uncertainty-with |
Repo | https://github.com/mlaves/bayesian-temperature-scaling |
Framework | pytorch |
Particle Flow Bayes’ Rule
Title | Particle Flow Bayes’ Rule |
Authors | Xinshi Chen, Hanjun Dai, Le Song |
Abstract | We present a particle flow realization of Bayes’ rule, where an ODE-based neural operator is used to transport particles from a prior to its posterior after a new observation. We prove that such an ODE operator exists. Its neural parameterization can be trained in a meta-learning framework, allowing this operator to reason about the effect of an individual observation on the posterior, and thus generalize across different priors, observations and to sequential Bayesian inference. We demonstrated the generalization ability of our particle flow Bayes operator in several canonical and high dimensional examples. |
Tasks | Bayesian Inference, Meta-Learning |
Published | 2019-02-02 |
URL | https://arxiv.org/abs/1902.00640v3 |
https://arxiv.org/pdf/1902.00640v3.pdf | |
PWC | https://paperswithcode.com/paper/meta-particle-flow-for-sequential-bayesian |
Repo | https://github.com/xinshi-chen/ParticleFlowBayesRule |
Framework | pytorch |
Learning Bayesian posteriors with neural networks for gravitational-wave inference
Title | Learning Bayesian posteriors with neural networks for gravitational-wave inference |
Authors | Alvin J. K. Chua, Michele Vallisneri |
Abstract | We seek to achieve the Holy Grail of Bayesian inference for gravitational-wave astronomy: using deep-learning techniques to instantly produce the posterior $p(\thetaD)$ for the source parameters $\theta$, given the detector data $D$. To do so, we train a deep neural network to take as input a signal + noise data set (drawn from the astrophysical source-parameter prior and the sampling distribution of detector noise), and to output a parametrized approximation of the corresponding posterior. We rely on a compact representation of the data based on reduced-order modeling, which we generate efficiently using a separate neural-network waveform interpolant [A. J. K. Chua, C. R. Galley & M. Vallisneri, Phys. Rev. Lett. 122, 211101 (2019)]. Our scheme has broad relevance to gravitational-wave applications such as low-latency parameter estimation and characterizing the science returns of future experiments. Source code and trained networks are available online at https://github.com/vallis/truebayes. |
Tasks | Bayesian Inference |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05966v3 |
https://arxiv.org/pdf/1909.05966v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-bayes-theorem-with-a-neural-network |
Repo | https://github.com/vallis/truebayes |
Framework | pytorch |
Boundary Loss for Remote Sensing Imagery Semantic Segmentation
Title | Boundary Loss for Remote Sensing Imagery Semantic Segmentation |
Authors | Alexey Bokhovkin, Evgeny Burnaev |
Abstract | In response to the growing importance of geospatial data, its analysis including semantic segmentation becomes an increasingly popular task in computer vision today. Convolutional neural networks are powerful visual models that yield hierarchies of features and practitioners widely use them to process remote sensing data. When performing remote sensing image segmentation, multiple instances of one class with precisely defined boundaries are often the case, and it is crucial to extract those boundaries accurately. The accuracy of segments boundaries delineation influences the quality of the whole segmented areas explicitly. However, widely-used segmentation loss functions such as BCE, IoU loss or Dice loss do not penalize misalignment of boundaries sufficiently. In this paper, we propose a novel loss function, namely a differentiable surrogate of a metric accounting accuracy of boundary detection. We can use the loss function with any neural network for binary segmentation. We performed validation of our loss function with various modifications of UNet on a synthetic dataset, as well as using real-world data (ISPRS Potsdam, INRIA AIL). Trained with the proposed loss function, models outperform baseline methods in terms of IoU score. |
Tasks | Boundary Detection, Semantic Segmentation |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.07852v1 |
https://arxiv.org/pdf/1905.07852v1.pdf | |
PWC | https://paperswithcode.com/paper/boundary-loss-for-remote-sensing-imagery |
Repo | https://github.com/yiskw713/boundary_loss_for_remote_sensing |
Framework | pytorch |
LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning
Title | LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning |
Authors | Huaiyu Li, Weiming Dong, Xing Mei, Chongyang Ma, Feiyue Huang, Bao-Gang Hu |
Abstract | In this work, we propose a novel meta-learning approach for few-shot classification, which learns transferable prior knowledge across tasks and directly produces network parameters for similar unseen tasks with training samples. Our approach, called LGM-Net, includes two key modules, namely, TargetNet and MetaNet. The TargetNet module is a neural network for solving a specific task and the MetaNet module aims at learning to generate functional weights for TargetNet by observing training samples. We also present an intertask normalization strategy for the training process to leverage common information shared across different tasks. The experimental results on Omniglot and miniImageNet datasets demonstrate that LGM-Net can effectively adapt to similar unseen tasks and achieve competitive performance, and the results on synthetic datasets show that transferable prior knowledge is learned by the MetaNet module via mapping training data to functional weights. LGM-Net enables fast learning and adaptation since no further tuning steps are required compared to other meta-learning approaches. |
Tasks | Few-Shot Learning, Meta-Learning, Omniglot |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06331v1 |
https://arxiv.org/pdf/1905.06331v1.pdf | |
PWC | https://paperswithcode.com/paper/lgm-net-learning-to-generate-matching |
Repo | https://github.com/likesiwell/LGM-Net |
Framework | tf |
Image Outpainting and Harmonization using Generative Adversarial Networks
Title | Image Outpainting and Harmonization using Generative Adversarial Networks |
Authors | Basile Van Hoorick |
Abstract | Although the inherently ambiguous task of predicting what resides beyond all four edges of an image has rarely been explored before, we demonstrate that GANs hold powerful potential in producing reasonable extrapolations. Two outpainting methods are proposed that aim to instigate this line of research: the first approach uses a context encoder inspired by common inpainting architectures and paradigms, while the second approach adds an extra post-processing step using a single-image generative model. This way, the hallucinated details are integrated with the style of the original image, in an attempt to further boost the quality of the result and possibly allow for arbitrary output resolutions to be supported. |
Tasks | Conditional Image Generation, Image Outpainting |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10960v2 |
https://arxiv.org/pdf/1912.10960v2.pdf | |
PWC | https://paperswithcode.com/paper/image-outpainting-and-harmonization-using |
Repo | https://github.com/basilevh/image-outpainting |
Framework | pytorch |
RecurSIA-RRT: Recursive translatable point-set pattern discovery with removal of redundant translators
Title | RecurSIA-RRT: Recursive translatable point-set pattern discovery with removal of redundant translators |
Authors | David Meredith |
Abstract | We introduce two algorithms, RECURSIA and RRT, designed to increase the compression factor achievable using point-set cover algorithms based on the SIA and SIATEC pattern discovery algorithms. SIA computes the maximal translatable patterns (MTPs) in a point set, while SIATEC computes the translational equivalence class (TEC) of every MTP in a point set, where the TEC of an MTP is the set of translationally invariant occurrences of that MTP in the point set. In its output, SIATEC encodes each MTP TEC as a pair, <P,V>, where P is the first occurrence of the MTP and V is the set of non-zero vectors that map P onto its other occurrences. RECURSIA recursively applies a TEC cover algorithm to the pattern P, in each TEC, <P,V>, that it discovers. RRT attempts to remove translators from V in each TEC without reducing the total set of points covered by the TEC. When evaluated with COSIATEC, SIATECCompress and Forth’s algorithm on the JKU Patterns Development Database, using RECURSIA with or without RRT increased compression factor and recall but reduced precision. Using RRT alone increased compression factor and reduced recall and precision, but had a smaller effect than RECURSIA. |
Tasks | |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12286v2 |
https://arxiv.org/pdf/1906.12286v2.pdf | |
PWC | https://paperswithcode.com/paper/recursia-rrt-recursive-translatable-point-set |
Repo | https://github.com/chromamorph/omnisia-recursia-rrt-mml-2019 |
Framework | none |
Intent term selection and refinement in e-commerce queries
Title | Intent term selection and refinement in e-commerce queries |
Authors | Saurav Manchanda, Mohit Sharma, George Karypis |
Abstract | In e-commerce, a user tends to search for the desired product by issuing a query to the search engine and examining the retrieved results. If the search engine was successful in correctly understanding the user’s query, it will return results that correspond to the products whose attributes match the terms in the query that are representative of the query’s product intent. However, the search engine may fail to retrieve results that satisfy the query’s product intent and thus degrading user experience due to different issues in query processing: (i) when multiple terms are present in a query it may fail to determine the relevant terms that are representative of the query’s product intent, and (ii) it may suffer from vocabulary gap between the terms in the query and the product’s description, i.e., terms used in the query are semantically similar but different from the terms in the product description. Hence, identifying the terms that describe the query’s product intent and predicting additional terms that describe the query’s product intent better than the existing query terms to the search engine is an essential task in e-commerce search. In this paper, we leverage the historical query reformulation logs of a major e-commerce retailer to develop distant-supervised approaches to solve both these problems. Our approaches exploit the fact that the significance of a term is dependent upon the context (other terms in the neighborhood) in which it is used in order to learn the importance of the term towards the query’s product intent. We show that identifying and emphasizing the terms that define the query’s product intent leads to a 3% improvement in ranking. Moreover, for the tasks of identifying the important terms in a query and for predicting the additional terms that represent product intent, experiments illustrate that our approaches outperform the non-contextual baselines. |
Tasks | |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08564v1 |
https://arxiv.org/pdf/1908.08564v1.pdf | |
PWC | https://paperswithcode.com/paper/intent-term-selection-and-refinement-in-e |
Repo | https://github.com/gurdaspuriya/query_intent |
Framework | pytorch |
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets
Title | Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine-Learning Applied to Large-Sample Datasets |
Authors | Frederik Kratzert, Daniel Klotz, Guy Shalev, Günter Klambauer, Sepp Hochreiter, Grey Nearing |
Abstract | Regional rainfall-runoff modeling is an old but still mostly out-standing problem in Hydrological Sciences. The problem currently is that traditional hydrological models degrade significantly in performance when calibrated for multiple basins together instead of for a single basin alone. In this paper, we propose a novel, data-driven approach using Long Short-Term Memory networks (LSTMs), and demonstrate that under a ‘big data’ paradigm, this is not necessarily the case. By training a single LSTM model on 531 basins from the CAMELS data set using meteorological time series data and static catchment attributes, we were able to significantly improve performance compared to a set of several different hydrological benchmark models. Our proposed approach not only significantly outperforms hydrological models that were calibrated regionally but also achieves better performance than hydrological models that were calibrated for each basin individually. Furthermore, we propose an adaption to the standard LSTM architecture, which we call an Entity-Aware-LSTM (EA-LSTM), that allows for learning, and embedding as a feature layer in a deep learning model, catchment similarities. We show that this learned catchment similarity corresponds well with what we would expect from prior hydrological understanding. |
Tasks | Time Series |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08456v2 |
https://arxiv.org/pdf/1907.08456v2.pdf | |
PWC | https://paperswithcode.com/paper/benchmarking-a-catchment-aware-long-short |
Repo | https://github.com/kratzert/ealstm_regional_modeling |
Framework | pytorch |
Generative Models for Effective ML on Private, Decentralized Datasets
Title | Generative Models for Effective ML on Private, Decentralized Datasets |
Authors | Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas |
Abstract | To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data - of representative samples, of outliers, of misclassifications - is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is problematic for privacy sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models - trained using federated methods and with formal differential privacy guarantees - can be used effectively to debug many commonly occurring data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06679v2 |
https://arxiv.org/pdf/1911.06679v2.pdf | |
PWC | https://paperswithcode.com/paper/generative-models-for-effective-ml-on-private-1 |
Repo | https://github.com/tensorflow/gan |
Framework | tf |
Confident Learning: Estimating Uncertainty in Dataset Labels
Title | Confident Learning: Estimating Uncertainty in Dataset Labels |
Authors | Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang |
Abstract | Learning exists in the context of data, yet notions of \emph{confidence} typically focus on model predictions, not label quality. Confident learning (CL) has emerged as an approach for characterizing, identifying, and learning with noisy labels in datasets, based on the principles of pruning noisy data, counting to estimate noise, and ranking examples to train with confidence. Here, we generalize CL, building on the assumption of a classification noise process, to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels. This generalized CL, open-sourced as \texttt{cleanlab}, is provably consistent across reasonable conditions, and experimentally performant on ImageNet and CIFAR, outperforming seven recent approaches when label noise is non-uniform. \texttt{cleanlab} also quantifies ontological class overlap, and can increase model accuracy (e.g. ResNet) by providing clean data for training. |
Tasks | |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.00068v2 |
https://arxiv.org/pdf/1911.00068v2.pdf | |
PWC | https://paperswithcode.com/paper/confident-learning-estimating-uncertainty-in |
Repo | https://github.com/cgnorthcutt/cleanlab |
Framework | pytorch |