Paper Group ANR 786
Actively Avoiding Nonsense in Generative Models. Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders. Joint Learning of Motion Estimation and Segmentation for Cardiac MR Image Sequences. Inference in Graded Bayesian Networks. An interpretable multiple kernel learning approach for the discovery of integrative cancer subtype …
Actively Avoiding Nonsense in Generative Models
Title | Actively Avoiding Nonsense in Generative Models |
Authors | Steve Hanneke, Adam Kalai, Gautam Kamath, Christos Tzamos |
Abstract | A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to “model error,” i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly invalid, together with random positive examples sampled from the true distribution. The goal is to maximize the likelihood of the positive examples subject to the constraint of (almost) never generating examples labeled invalid by the oracle. Guarantees are agnostic compared to a class of probability distributions. We show that, while proper learning often requires exponentially many queries to the invalidity oracle, improper distribution learning can be done using polynomially many queries. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07229v1 |
http://arxiv.org/pdf/1802.07229v1.pdf | |
PWC | https://paperswithcode.com/paper/actively-avoiding-nonsense-in-generative |
Repo | |
Framework | |
Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders
Title | Resisting Adversarial Attacks using Gaussian Mixture Variational Autoencoders |
Authors | Partha Ghosh, Arpan Losalka, Michael J Black |
Abstract | Susceptibility of deep neural networks to adversarial attacks poses a major theoretical and practical challenge. All efforts to harden classifiers against such attacks have seen limited success. Two distinct categories of samples to which deep networks are vulnerable, “adversarial samples” and “fooling samples”, have been tackled separately so far due to the difficulty posed when considered together. In this work, we show how one can address them both under one unified framework. We tie a discriminative model with a generative model, rendering the adversarial objective to entail a conflict. Our model has the form of a variational autoencoder, with a Gaussian mixture prior on the latent vector. Each mixture component of the prior distribution corresponds to one of the classes in the data. This enables us to perform selective classification, leading to the rejection of adversarial samples instead of misclassification. Our method inherently provides a way of learning a selective classifier in a semi-supervised scenario as well, which can resist adversarial attacks. We also show how one can reclassify the rejected adversarial samples. |
Tasks | |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1806.00081v2 |
http://arxiv.org/pdf/1806.00081v2.pdf | |
PWC | https://paperswithcode.com/paper/resisting-adversarial-attacks-using-gaussian |
Repo | |
Framework | |
Joint Learning of Motion Estimation and Segmentation for Cardiac MR Image Sequences
Title | Joint Learning of Motion Estimation and Segmentation for Cardiac MR Image Sequences |
Authors | Chen Qin, Wenjia Bai, Jo Schlemper, Steffen E. Petersen, Stefan K. Piechnik, Stefan Neubauer, Daniel Rueckert |
Abstract | Cardiac motion estimation and segmentation play important roles in quantitatively assessing cardiac function and diagnosing cardiovascular diseases. In this paper, we propose a novel deep learning method for joint estimation of motion and segmentation from cardiac MR image sequences. The proposed network consists of two branches: a cardiac motion estimation branch which is built on a novel unsupervised Siamese style recurrent spatial transformer network, and a cardiac segmentation branch that is based on a fully convolutional network. In particular, a joint multi-scale feature encoder is learned by optimizing the segmentation branch and the motion estimation branch simultaneously. This enables the weakly-supervised segmentation by taking advantage of features that are unsupervisedly learned in the motion estimation branch from a large amount of unannotated data. Experimental results using cardiac MRI images from 220 subjects show that the joint learning of both tasks is complementary and the proposed models outperform the competing methods significantly in terms of accuracy and speed. |
Tasks | Cardiac Segmentation, Motion Estimation |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04066v1 |
http://arxiv.org/pdf/1806.04066v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-learning-of-motion-estimation-and |
Repo | |
Framework | |
Inference in Graded Bayesian Networks
Title | Inference in Graded Bayesian Networks |
Authors | Robert Leppert, Karl-Heinz Zimmermann |
Abstract | Machine learning provides algorithms that can learn from data and make inferences or predictions on data. Bayesian networks are a class of graphical models that allow to represent a collection of random variables and their condititional dependencies by directed acyclic graphs. In this paper, an inference algorithm for the hidden random variables of a Bayesian network is given by using the tropicalization of the marginal distribution of the observed variables. By restricting the topological structure to graded networks, an inference algorithm for graded Bayesian networks will be established that evaluates the hidden random variables rank by rank and in this way yields the most probable states of the hidden variables. This algorithm can be viewed as a generalized version of the Viterbi algorithm for graded Bayesian networks. |
Tasks | |
Published | 2018-12-23 |
URL | http://arxiv.org/abs/1901.01837v1 |
http://arxiv.org/pdf/1901.01837v1.pdf | |
PWC | https://paperswithcode.com/paper/inference-in-graded-bayesian-networks |
Repo | |
Framework | |
An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes
Title | An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes |
Authors | Nora K. Speicher, Nico Pfeifer |
Abstract | Due to the complexity of cancer, clustering algorithms have been used to disentangle the observed heterogeneity and identify cancer subtypes that can be treated specifically. While kernel based clustering approaches allow the use of more than one input matrix, which is an important factor when considering a multidimensional disease like cancer, the clustering results remain hard to evaluate and, in many cases, it is unclear which piece of information had which impact on the final result. In this paper, we propose an extension of multiple kernel learning clustering that enables the characterization of each identified patient cluster based on the features that had the highest impact on the result. To this end, we combine feature clustering with multiple kernel dimensionality reduction and introduce FIPPA, a score which measures the feature cluster impact on a patient cluster. Results: We applied the approach to different cancer types described by four different data types with the aim of identifying integrative patient subtypes and understanding which features were most important for their identification. Our results show that our method does not only have state-of-the-art performance according to standard measures (e.g., survival analysis), but, based on the high impact features, it also produces meaningful explanations for the molecular bases of the subtypes. This could provide an important step in the validation of potential cancer subtypes and enable the formulation of new hypotheses concerning individual patient groups. Similar analysis are possible for other disease phenotypes. |
Tasks | Dimensionality Reduction, Discovery Of Integrative Cancer Subtypes, Survival Analysis |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08102v1 |
http://arxiv.org/pdf/1811.08102v1.pdf | |
PWC | https://paperswithcode.com/paper/an-interpretable-multiple-kernel-learning |
Repo | |
Framework | |
Recurrent Attention Unit
Title | Recurrent Attention Unit |
Authors | Guoqiang Zhong, Guohua Yue, Xiao Ling |
Abstract | Recurrent Neural Network (RNN) has been successfully applied in many sequence learning problems. Such as handwriting recognition, image description, natural language processing and video motion analysis. After years of development, researchers have improved the internal structure of the RNN and introduced many variants. Among others, Gated Recurrent Unit (GRU) is one of the most widely used RNN model. However, GRU lacks the capability of adaptively paying attention to certain regions or locations, so that it may cause information redundancy or loss during leaning. In this paper, we propose a RNN model, called Recurrent Attention Unit (RAU), which seamlessly integrates the attention mechanism into the interior of GRU by adding an attention gate. The attention gate can enhance GRU’s ability to remember long-term memory and help memory cells quickly discard unimportant content. RAU is capable of extracting information from the sequential data by adaptively selecting a sequence of regions or locations and pay more attention to the selected regions during learning. Extensive experiments on image classification, sentiment classification and language modeling show that RAU consistently outperforms GRU and other baseline methods. |
Tasks | Image Classification, Language Modelling, Sentiment Analysis |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12754v1 |
http://arxiv.org/pdf/1810.12754v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-attention-unit |
Repo | |
Framework | |
Iterative Classroom Teaching
Title | Iterative Classroom Teaching |
Authors | Teresa Yeo, Parameswaran Kamalaruban, Adish Singla, Arpit Merchant, Thibault Asselborn, Louis Faucon, Pierre Dillenbourg, Volkan Cevher |
Abstract | We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom using O(min{d,N} log(1/eps)) examples, where d is the ambient dimension of the problem, N is the number of learners, and eps is the accuracy parameter. We show the robustness of our teaching strategy when the teacher has limited knowledge of the learners’ internal dynamics as provided by a noisy oracle. Further, we study the trade-off between the learners’ workload and the teacher’s cost in teaching the target concept. Our experiments validate our theoretical results and suggest that appropriately partitioning the classroom into homogenous groups provides a balance between these two objectives. |
Tasks | |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03537v2 |
http://arxiv.org/pdf/1811.03537v2.pdf | |
PWC | https://paperswithcode.com/paper/iterative-classroom-teaching |
Repo | |
Framework | |
Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
Title | Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes |
Authors | Chris Junchi Li, Zhaoran Wang, Han Liu |
Abstract | Solving statistical learning problems often involves nonconvex optimization. Despite the empirical success of nonconvex statistical optimization methods, their global dynamics, especially convergence to the desirable local minima, remain less well understood in theory. In this paper, we propose a new analytic paradigm based on diffusion processes to characterize the global dynamics of nonconvex statistical optimization. As a concrete example, we study stochastic gradient descent (SGD) for the tensor decomposition formulation of independent component analysis. In particular, we cast different phases of SGD into diffusion processes, i.e., solutions to stochastic differential equations. Initialized from an unstable equilibrium, the global dynamics of SGD transit over three consecutive phases: (i) an unstable Ornstein-Uhlenbeck process slowly departing from the initialization, (ii) the solution to an ordinary differential equation, which quickly evolves towards the desirable local minimum, and (iii) a stable Ornstein-Uhlenbeck process oscillating around the desirable local minimum. Our proof techniques are based upon Stroock and Varadhan’s weak convergence of Markov chains to diffusion processes, which are of independent interest. |
Tasks | |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.09642v1 |
http://arxiv.org/pdf/1808.09642v1.pdf | |
PWC | https://paperswithcode.com/paper/online-ica-understanding-global-dynamics-of |
Repo | |
Framework | |
Trading algorithms with learning in latent alpha models
Title | Trading algorithms with learning in latent alpha models |
Authors | Philippe Casgrain, Sebastian Jaimungal |
Abstract | Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader’s actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example. |
Tasks | Calibration |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04472v1 |
http://arxiv.org/pdf/1806.04472v1.pdf | |
PWC | https://paperswithcode.com/paper/trading-algorithms-with-learning-in-latent |
Repo | |
Framework | |
A Dynamical Systems Perspective on Nonsmooth Constrained Optimization
Title | A Dynamical Systems Perspective on Nonsmooth Constrained Optimization |
Authors | Guilherme França, Daniel P. Robinson, René Vidal |
Abstract | The acceleration technique introduced by Nesterov for gradient descent is widely used in machine learning, but its principles are not yet fully understood. Recently, significant progress has been made to close this understanding gap through a continuous-time dynamical systems perspective associated with gradient methods for smooth and unconstrained problems. Here we extend this perspective to nonsmooth and linearly constrained problems by deriving nonsmooth dynamical systems related to variants of the relaxed and accelerated alternating direction method of multipliers (ADMM). We introduce two new ADMM variants, one based on Nesterov’s acceleration and the other inspired by Polyak’s heavy ball method, and derive differential inclusions modelling these algorithms in the continuous-time limit. Using a nonsmooth Lyapunov analysis, we obtain rate-of-convergence results for these dynamical systems in the convex and strongly convex setting that illustrate an interesting tradeoff between Nesterov and heavy ball acceleration. |
Tasks | |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04048v2 |
http://arxiv.org/pdf/1808.04048v2.pdf | |
PWC | https://paperswithcode.com/paper/a-dynamical-systems-perspective-on-nonsmooth |
Repo | |
Framework | |
Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension
Title | Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension |
Authors | Kyosuke Nishida, Itsumi Saito, Atsushi Otsuka, Hisako Asano, Junji Tomita |
Abstract | This study considers the task of machine reading at scale (MRS) wherein, given a question, a system first performs the information retrieval (IR) task of finding relevant passages in a knowledge source and then carries out the reading comprehension (RC) task of extracting an answer span from the passages. Previous MRS studies, in which the IR component was trained without considering answer spans, struggled to accurately find a small number of relevant passages from a large set of passages. In this paper, we propose a simple and effective approach that incorporates the IR and RC tasks by using supervised multi-task learning in order that the IR component can be trained by considering answer spans. Experimental results on the standard benchmark, answering SQuAD questions using the full Wikipedia as the knowledge source, showed that our model achieved state-of-the-art performance. Moreover, we thoroughly evaluated the individual contributions of our model components with our new Japanese dataset and SQuAD. The results showed significant improvements in the IR task and provided a new perspective on IR for RC: it is effective to teach which part of the passage answers the question rather than to give only a relevance score to the whole passage. |
Tasks | Information Retrieval, Multi-Task Learning, Reading Comprehension |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10628v1 |
http://arxiv.org/pdf/1808.10628v1.pdf | |
PWC | https://paperswithcode.com/paper/retrieve-and-read-multi-task-learning-of |
Repo | |
Framework | |
Active and Adaptive Sequential learning
Title | Active and Adaptive Sequential learning |
Authors | Yuheng Bu, Jiaxun Lu, Venugopal V. Veeravalli |
Abstract | A framework is introduced for actively and adaptively solving a sequence of machine learning problems, which are changing in bounded manner from one time step to the next. An algorithm is developed that actively queries the labels of the most informative samples from an unlabeled data pool, and that adapts to the change by utilizing the information acquired in the previous steps. Our analysis shows that the proposed active learning algorithm based on stochastic gradient descent achieves a near-optimal excess risk performance for maximum likelihood estimation. Furthermore, an estimator of the change in the learning problems using the active learning samples is constructed, which provides an adaptive sample size selection rule that guarantees the excess risk is bounded for sufficiently large number of time steps. Experiments with synthetic and real data are presented to validate our algorithm and theoretical results. |
Tasks | Active Learning |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11710v1 |
http://arxiv.org/pdf/1805.11710v1.pdf | |
PWC | https://paperswithcode.com/paper/active-and-adaptive-sequential-learning |
Repo | |
Framework | |
A new multilayer optical film optimal method based on deep q-learning
Title | A new multilayer optical film optimal method based on deep q-learning |
Authors | Anqing Jiang, Osamu Yoshie, LiangYao Chen |
Abstract | Multi-layer optical film has been found to afford important applications in optical communication, optical absorbers, optical filters, etc. Different algorithms of multi-layer optical film design has been developed, as simplex method, colony algorithm, genetic algorithm. These algorithms rapidly promote the design and manufacture of multi-layer films. However, traditional numerical algorithms of converge to local optimum. This means that the algorithms can not give a global optimal solution to the material researchers. In recent years, due to the rapid development to far artificial intelligence, to optimize optical film structure using AI algorithm has become possible. In this paper, we will introduce a new optical film design algorithm based on the deep Q learning. This model can converge the global optimum of the optical thin film structure, this will greatly improve the design efficiency of multi-layer films. |
Tasks | Q-Learning |
Published | 2018-12-07 |
URL | http://arxiv.org/abs/1812.02873v1 |
http://arxiv.org/pdf/1812.02873v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-multilayer-optical-film-optimal-method |
Repo | |
Framework | |
Characterisation of (Sub)sequential Rational Functions over a General Class Monoids
Title | Characterisation of (Sub)sequential Rational Functions over a General Class Monoids |
Authors | Stefan Gerdjikov |
Abstract | In this technical report we describe a general class of monoids for which (sub)sequential rational can be characterised in terms of a congruence relation in the flavour of Myhill-Nerode relation. The class of monoids that we consider can be described in terms of natural algebraic axioms, contains the free monoids, groups, the tropical monoid, and is closed under Cartesian. |
Tasks | |
Published | 2018-01-28 |
URL | http://arxiv.org/abs/1801.10063v1 |
http://arxiv.org/pdf/1801.10063v1.pdf | |
PWC | https://paperswithcode.com/paper/characterisation-of-subsequential-rational |
Repo | |
Framework | |
Projective Splitting with Forward Steps only Requires Continuity
Title | Projective Splitting with Forward Steps only Requires Continuity |
Authors | Patrick R. Johnstone, Jonathan Eckstein |
Abstract | A recent innovation in projective splitting algorithms for monotone operator inclusions has been the development of a procedure using two forward steps instead of the customary proximal steps for operators that are Lipschitz continuous. This paper shows that the Lipschitz assumption is unnecessary when the forward steps are performed in finite-dimensional spaces: a backtracking linesearch yields a convergent algorithm for operators that are merely continuous with full domain. |
Tasks | |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.07180v1 |
http://arxiv.org/pdf/1809.07180v1.pdf | |
PWC | https://paperswithcode.com/paper/projective-splitting-with-forward-steps-only |
Repo | |
Framework | |