Paper Group ANR 999
Increasing the Generalisation Capacity of Conditional VAEs. Unsupervised Question Answering by Cloze Translation. A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics. Cross-Modal Data Programming Enables Rapid Medical Machine Learning. Plain Eng …
Increasing the Generalisation Capacity of Conditional VAEs
Title | Increasing the Generalisation Capacity of Conditional VAEs |
Authors | Alexej Klushyn, Nutan Chen, Botond Cseke, Justin Bayer, Patrick van der Smagt |
Abstract | We address the problem of one-to-many mappings in supervised learning, where a single instance has many different solutions of possibly equal cost. The framework of conditional variational autoencoders describes a class of methods to tackle such structured-prediction tasks by means of latent variables. We propose to incentivise informative latent representations for increasing the generalisation capacity of conditional variational autoencoders. To this end, we modify the latent variable model by defining the likelihood as a function of the latent variable only and introduce an expressive multimodal prior to enable the model for capturing semantically meaningful features of the data. To validate our approach, we train our model on the Cornell Robot Grasping dataset, and modified versions of MNIST and Fashion-MNIST obtaining results that show a significantly higher generalisation capability. |
Tasks | Structured Prediction |
Published | 2019-08-23 |
URL | https://arxiv.org/abs/1908.08750v2 |
https://arxiv.org/pdf/1908.08750v2.pdf | |
PWC | https://paperswithcode.com/paper/increasing-the-generalisaton-capacity-of |
Repo | |
Framework | |
Unsupervised Question Answering by Cloze Translation
Title | Unsupervised Question Answering by Cloze Translation |
Authors | Patrick Lewis, Ludovic Denoyer, Sebastian Riedel |
Abstract | Obtaining training data for Question Answering (QA) is time-consuming and resource-intensive, and existing QA datasets are only available for limited domains and languages. In this work, we explore to what extent high quality training data is actually required for Extractive QA, and investigate the possibility of unsupervised Extractive QA. We approach this problem by first learning to generate context, question and answer triples in an unsupervised manner, which we then use to synthesize Extractive QA training data automatically. To generate such triples, we first sample random context paragraphs from a large corpus of documents and then random noun phrases or named entity mentions from these paragraphs as answers. Next we convert answers in context to “fill-in-the-blank” cloze questions and finally translate them into natural questions. We propose and compare various unsupervised ways to perform cloze-to-natural question translation, including training an unsupervised NMT model using non-aligned corpora of natural questions and cloze questions as well as a rule-based approach. We find that modern QA models can learn to answer human questions surprisingly well using only synthetic training data. We demonstrate that, without using the SQuAD training data at all, our approach achieves 56.4 F1 on SQuAD v1 (64.5 F1 when the answer is a Named entity mention), outperforming early supervised models. |
Tasks | Question Answering |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04980v2 |
https://arxiv.org/pdf/1906.04980v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-question-answering-by-cloze |
Repo | |
Framework | |
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics
Title | A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics |
Authors | Weinan E, Chao Ma, Lei Wu |
Abstract | A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general regimes for the network width and training data size are considered. In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels. In addition, it is proved that throughout the training process the functions represented by the neural network model are uniformly close to that of a kernel method. For general values of the network width and training data size, sharp estimates of the generalization error is established for target functions in the appropriate reproducing kernel Hilbert space. |
Tasks | |
Published | 2019-04-08 |
URL | https://arxiv.org/abs/1904.04326v2 |
https://arxiv.org/pdf/1904.04326v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-analysis-of-the-optimization |
Repo | |
Framework | |
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Title | Cross-Modal Data Programming Enables Rapid Medical Machine Learning |
Authors | Jared Dunnmon, Alexander Ratner, Nishith Khandwala, Khaled Saab, Matthew Markert, Hersh Sagreiya, Roger Goldman, Christopher Lee-Messer, Matthew Lungren, Daniel Rubin, Christopher Ré |
Abstract | Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine. |
Tasks | Time Series |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.11101v1 |
http://arxiv.org/pdf/1903.11101v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-modal-data-programming-enables-rapid |
Repo | |
Framework | |
Plain English Summarization of Contracts
Title | Plain English Summarization of Contracts |
Authors | Laura Manor, Junyi Jessy Li |
Abstract | Unilateral contracts, such as terms of service, play a substantial role in modern digital life. However, few users read these documents before accepting the terms within, as they are too long and the language too complicated. We propose the task of summarizing such legal documents in plain English, which would enable users to have a better understanding of the terms they are accepting. We propose an initial dataset of legal text snippets paired with summaries written in plain English. We verify the quality of these summaries manually and show that they involve heavy abstraction, compression, and simplification. Initial experiments show that unsupervised extractive summarization methods do not perform well on this task due to the level of abstraction and style differences. We conclude with a call for resource and technique development for simplification and style transfer for legal language. |
Tasks | Style Transfer |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00424v1 |
https://arxiv.org/pdf/1906.00424v1.pdf | |
PWC | https://paperswithcode.com/paper/190600424 |
Repo | |
Framework | |
Unsupervised cycle-consistent deformation for shape matching
Title | Unsupervised cycle-consistent deformation for shape matching |
Authors | Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, Mathieu Aubry |
Abstract | We propose a self-supervised approach to deep surface deformation. Given a pair of shapes, our algorithm directly predicts a parametric transformation from one shape to the other respecting correspondences. Our insight is to use cycle-consistency to define a notion of good correspondences in groups of objects and use it as a supervisory signal to train our network. Our method does not rely on a template, assume near isometric deformations or rely on point-correspondence supervision. We demonstrate the efficacy of our approach by using it to transfer segmentation across shapes. We show, on Shapenet, that our approach is competitive with comparable state-of-the-art methods when annotated training data is readily available, but outperforms them by a large margin in the few-shot segmentation scenario. |
Tasks | |
Published | 2019-07-06 |
URL | https://arxiv.org/abs/1907.03165v1 |
https://arxiv.org/pdf/1907.03165v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-cycle-consistent-deformation-for |
Repo | |
Framework | |
TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications
Title | TATi-Thermodynamic Analytics ToolkIt: TensorFlow-based software for posterior sampling in machine learning applications |
Authors | Frederik Heber, Zofia Trstanova, Benedict Leimkuhler |
Abstract | With the advent of GPU-assisted hardware and maturing high-efficiency software platforms such as TensorFlow and PyTorch, Bayesian posterior sampling for neural networks becomes plausible. In this article we discuss Bayesian parametrization in machine learning based on Markov Chain Monte Carlo methods, specifically discretized stochastic differential equations such as Langevin dynamics and extended system methods in which an ensemble of walkers is employed to enhance sampling. We provide a glimpse of the potential of the sampling-intensive approach by studying (and visualizing) the loss landscape of a neural network applied to the MNIST data set. Moreover, we investigate how the sampling efficiency itself can be significantly enhanced through an ensemble quasi-Newton preconditioning method. This article accompanies the release of a new TensorFlow software package, the Thermodynamic Analytics ToolkIt, which is used in the computational experiments. |
Tasks | |
Published | 2019-03-20 |
URL | https://arxiv.org/abs/1903.08640v2 |
https://arxiv.org/pdf/1903.08640v2.pdf | |
PWC | https://paperswithcode.com/paper/tati-thermodynamic-analytics-toolkit |
Repo | |
Framework | |
Sampling the “Inverse Set” of a Neuron: An Approach to Understanding Neural Nets
Title | Sampling the “Inverse Set” of a Neuron: An Approach to Understanding Neural Nets |
Authors | Suryabhan Singh Hada, Miguel Á. Carreira-Perpiñán |
Abstract | With the recent success of deep neural networks in computer vision, it is important to understand the internal working of these networks. What does a given neuron represent? The concepts captured by a neuron may be hard to understand or express in simple terms. The approach we propose in this paper is to characterize the region of input space that excites a given neuron to a certain level; we call this the inverse set. This inverse set is a complicated high dimensional object that we explore by an optimization-based sampling approach. Inspection of samples of this set by a human can reveal regularities that help to understand the neuron. This goes beyond approaches which were limited to finding an image which maximally activates the neuron or using Markov chain Monte Carlo to sample images, but this is very slow, generates samples with little diversity and lacks control over the activation value of the generated samples. Our approach also allows us to explore the intersection of inverse sets of several neurons and other variations. |
Tasks | |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1910.04857v1 |
https://arxiv.org/pdf/1910.04857v1.pdf | |
PWC | https://paperswithcode.com/paper/sampling-the-inverse-set-of-a-neuron-an |
Repo | |
Framework | |
A global constraint for the capacitated single-item lot-sizing problem
Title | A global constraint for the capacitated single-item lot-sizing problem |
Authors | Grigori German, Hadrien Cambazard, Jean-Philippe Gayon, Bernard Penz |
Abstract | The goal of this paper is to set a constraint programming framework to solve lot-sizing problems. More specifically, we consider a single-item lot-sizing problem with time-varying lower and upper bounds for production and inventory. The cost structure includes time-varying holding costs, unitary production costs and setup costs. We establish a new lower bound for this problem by using a subtle time decomposition. We formulate this NP-hard problem as a global constraint and show that bound consistency can be achieved in pseudo-polynomial time and when not including the costs, in polynomial time. We develop filtering rules based on existing dynamic programming algorithms, exploiting the above mentioned time decomposition for difficult instances. In a numerical study, we compare several formulations of the problem: mixed integer linear programming, constraint programming and dynamic programming. We show that our global constraint is able to find solutions, unlike the decomposed constraint programming model and that constraint programming can be competitive, in particular when adding combinatorial side constraints. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02405v1 |
https://arxiv.org/pdf/1907.02405v1.pdf | |
PWC | https://paperswithcode.com/paper/a-global-constraint-for-the-capacitated |
Repo | |
Framework | |
Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis
Title | Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis |
Authors | Yifan Zhang, Ying Wei, Peilin Zhao, Shuaicheng Niu, Qingyao Wu, Mingkui Tan, Junzhou Huang |
Abstract | Deep learning based medical image diagnosis has shown great potential in clinical medicine. However, it often suffers two major difficulties in practice: 1) only limited labeled samples are available due to expensive annotation costs over medical images; 2) labeled images may contain considerable label noises (e.g., mislabeling labels) due to diagnostic difficulties. In this paper, we seek to exploit rich labeled data from relevant domains to help the learning in the target task with unsupervised domain adaptation (UDA). Unlike most existing UDA methods which rely on clean labeled data or assume samples are equally transferable, we propose a novel Collaborative Unsupervised Domain Adaptation algorithm to conduct transferability-aware domain adaptation and conquer label noise in a cooperative way. Promising empirical results verify the superiority of the proposed method. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.07293v1 |
https://arxiv.org/pdf/1911.07293v1.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-unsupervised-domain-adaptation |
Repo | |
Framework | |
Bayesian Anomaly Detection and Classification
Title | Bayesian Anomaly Detection and Classification |
Authors | Ethan Roberts, Bruce A. Bassett, Michelle Lochner |
Abstract | Statistical uncertainties are rarely incorporated in machine learning algorithms, especially for anomaly detection. Here we present the Bayesian Anomaly Detection And Classification (BADAC) formalism, which provides a unified statistical approach to classification and anomaly detection within a hierarchical Bayesian framework. BADAC deals with uncertainties by marginalising over the unknown, true, value of the data. Using simulated data with Gaussian noise, BADAC is shown to be superior to standard algorithms in both classification and anomaly detection performance in the presence of uncertainties, though with significantly increased computational cost. Additionally, BADAC provides well-calibrated classification probabilities, valuable for use in scientific pipelines. We show that BADAC can work in online mode and is fairly robust to model errors, which can be diagnosed through model-selection methods. In addition it can perform unsupervised new class detection and can naturally be extended to search for anomalous subsets of data. BADAC is therefore ideal where computational cost is not a limiting factor and statistical rigour is important. We discuss approximations to speed up BADAC, such as the use of Gaussian processes, and finally introduce a new metric, the Rank-Weighted Score (RWS), that is particularly suited to evaluating the ability of algorithms to detect anomalies. |
Tasks | Anomaly Detection, Gaussian Processes, Model Selection |
Published | 2019-02-22 |
URL | http://arxiv.org/abs/1902.08627v1 |
http://arxiv.org/pdf/1902.08627v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-anomaly-detection-and-classification |
Repo | |
Framework | |
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs
Title | Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs |
Authors | Lei Wang, Piotr Koniusz, Du Q. Huynh |
Abstract | In this paper, we revive the use of old-fashioned handcrafted video representations for action recognition and put new life into these techniques via a CNN-based hallucination step. Despite of the use of RGB and optical flow frames, the I3D model (amongst others) thrives on combining its output with the Improved Dense Trajectory (IDT) and extracted with its low-level video descriptors encoded via Bag-of-Words (BoW) and Fisher Vectors (FV). Such a fusion of CNNs and handcrafted representations is time-consuming due to pre-processing, descriptor extraction, encoding and tuning parameters. Thus, we propose an end-to-end trainable network with streams which learn the IDT-based BoW/FV representations at the training stage and are simple to integrate with the I3D model. Specifically, each stream takes I3D feature maps ahead of the last 1D conv. layer and learns to `translate’ these maps to BoW/FV representations. Thus, our model can hallucinate and use such synthesized BoW/FV representations at the testing stage. We show that even features of the entire I3D optical flow stream can be hallucinated thus simplifying the pipeline. Our model saves 20-55h of computations and yields state-of-the-art results on four publicly available datasets. | |
Tasks | Action Classification, Action Recognition In Videos, Optical Flow Estimation |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05910v2 |
https://arxiv.org/pdf/1906.05910v2.pdf | |
PWC | https://paperswithcode.com/paper/hallucinating-bag-of-words-and-fisher-vector |
Repo | |
Framework | |
Modular Deep Reinforcement Learning with Temporal Logic Specifications
Title | Modular Deep Reinforcement Learning with Temporal Logic Specifications |
Authors | Lim Zun Yuan, Mohammadhosein Hasanbeig, Alessandro Abate, Daniel Kroening |
Abstract | We propose an actor-critic, model-free, and online Reinforcement Learning (RL) framework for continuous-state continuous-action Markov Decision Processes (MDPs) when the reward is highly sparse but encompasses a high-level temporal structure. We represent this temporal structure by a finite-state machine and construct an on-the-fly synchronised product with the MDP and the finite machine. The temporal structure acts as a guide for the RL agent within the product, where a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy. We evaluate our framework in a Mars rover experiment and we present the success rate of the synthesised policy. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.11591v2 |
https://arxiv.org/pdf/1909.11591v2.pdf | |
PWC | https://paperswithcode.com/paper/modular-deep-reinforcement-learning-with |
Repo | |
Framework | |
Cultural association based on machine learning for team formation
Title | Cultural association based on machine learning for team formation |
Authors | Hrishikesh Kulkarni, Bradly Alicea |
Abstract | Culture is core to human civilization, and is essential for human intellectual achievements in social context. Culture also influences how humans work together, perform particular task and overall lifestyle and dealing with other groups of civilization. Thus, culture is concerned with establishing shared ideas, particularly those playing a key role in success. Does it impact on how two individuals can work together in achieving certain goals? In this paper, we establish a means to derive cultural association and map it to culturally mediated success. Human interactions with the environment are typically in the form of expressions. Association between culture and behavior produce similar beliefs which lead to common principles and actions, while cultural similarity as a set of common expressions and responses. To measure cultural association among different candidates, we propose the use of a Graphical Association Method (GAM). The behaviors of candidates are captured through series of expressions and represented in the graphical form. The association among corresponding node and core nodes is used for the same. Our approach provides a number of interesting results and promising avenues for future applications. |
Tasks | |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00234v1 |
https://arxiv.org/pdf/1908.00234v1.pdf | |
PWC | https://paperswithcode.com/paper/cultural-association-based-on-machine |
Repo | |
Framework | |
A CNN adapted to time series for the classification of Supernovae
Title | A CNN adapted to time series for the classification of Supernovae |
Authors | Anthony Brunel, Johanna Pasquet, Jérôme Pasquet, Nancy Rodriguez, Frédéric Comby, Dominique Fouchez, Marc Chaumont |
Abstract | Cosmologists are facing the problem of the analysis of a huge quantity of data when observing the sky. The methods used in cosmology are, for the most of them, relying on astrophysical models, and thus, for the classification, they usually use a machine learning approach in two-steps, which consists in, first, extracting features, and second, using a classifier. In this paper, we are specifically studying the supernovae phenomenon and especially the binary classification “I.a supernovae versus not-I.a supernovae”. We present two Convolutional Neural Networks (CNNs) defeating the current state-of-the-art. The first one is adapted to time series and thus to the treatment of supernovae light-curves. The second one is based on a Siamese CNN and is suited to the nature of data, i.e. their sparsity and their weak quantity (small learning database). |
Tasks | Time Series |
Published | 2019-01-02 |
URL | http://arxiv.org/abs/1901.00461v1 |
http://arxiv.org/pdf/1901.00461v1.pdf | |
PWC | https://paperswithcode.com/paper/a-cnn-adapted-to-time-series-for-the |
Repo | |
Framework | |