Paper Group ANR 1082
Multimodal Sparse Bayesian Dictionary Learning. Understanding Batch Normalization. Fast Single Image Rain Removal via a Deep Decomposition-Composition Network. Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. An Off-policy Policy Gradient Theorem Using Emphatic Weightings. Image Super-Resolution via RL-CSC: When Residual …
Multimodal Sparse Bayesian Dictionary Learning
Title | Multimodal Sparse Bayesian Dictionary Learning |
Authors | Igor Fedorov, Bhaskar D. Rao |
Abstract | This paper addresses the problem of learning dictionaries for multimodal datasets, i.e. datasets collected from multiple data sources. We present an algorithm called multimodal sparse Bayesian dictionary learning (MSBDL). MSBDL leverages information from all available data modalities through a joint sparsity constraint. The underlying framework offers a considerable amount of flexibility to practitioners and addresses many of the shortcomings of existing multimodal dictionary learning approaches. In particular, the procedure includes the automatic tuning of hyperparameters and is unique in that it allows the dictionaries for each data modality to have different cardinality, a significant feature in cases when the dimensionality of data differs across modalities. MSBDL is scalable and can be used in supervised learning settings. Theoretical results relating to the convergence of MSBDL are presented and the numerical results provide evidence of the superior performance of MSBDL on synthetic and real datasets compared to existing methods. |
Tasks | Dictionary Learning |
Published | 2018-04-10 |
URL | https://arxiv.org/abs/1804.03740v3 |
https://arxiv.org/pdf/1804.03740v3.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-sparse-bayesian-dictionary |
Repo | |
Framework | |
Understanding Batch Normalization
Title | Understanding Batch Normalization |
Authors | Johan Bjorck, Carla Gomes, Bart Selman, Kilian Q. Weinberger |
Abstract | Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a better understanding of BN, following an empirical approach. We conduct several experiments, and show that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization. For networks without BN we demonstrate how large gradient updates can result in diverging loss and activations growing uncontrollably with network depth, which limits possible learning rates. BN avoids this problem by constantly correcting activations to be zero-mean and of unit standard deviation, which enables larger gradient steps, yields faster convergence and may help bypass sharp local minima. We further show various ways in which gradients and activations of deep unnormalized networks are ill-behaved. We contrast our results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences. |
Tasks | |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.02375v4 |
http://arxiv.org/pdf/1806.02375v4.pdf | |
PWC | https://paperswithcode.com/paper/understanding-batch-normalization |
Repo | |
Framework | |
Fast Single Image Rain Removal via a Deep Decomposition-Composition Network
Title | Fast Single Image Rain Removal via a Deep Decomposition-Composition Network |
Authors | Siyuan LI, Wenqi Ren, Jiawan Zhang, Jinke Yu, Xiaojie Guo |
Abstract | Rain effect in images typically is annoying for many multimedia and computer vision tasks. For removing rain effect from a single image, deep leaning techniques have been attracting considerable attentions. This paper designs a novel multi-task leaning architecture in an end-to-end manner to reduce the mapping range from input to output and boost the performance. Concretely, a decomposition net is built to split rain images into clean background and rain layers. Different from previous architectures, our model consists of, besides a component representing the desired clean image, an extra component for the rain layer. During the training phase, we further employ a composition structure to reproduce the input by the separated clean image and rain information for improving the quality of decomposition. Experimental results on both synthetic and real images are conducted to reveal the high-quality recovery by our design, and show its superiority over other state-of-the-art methods. Furthermore, our design is also applicable to other layer decomposition tasks like dust removal. More importantly, our method only requires about 50ms, significantly faster than the competitors, to process a testing image in VGA resolution on a GTX 1080 GPU, making it attractive for practical use. |
Tasks | Rain Removal |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02688v1 |
http://arxiv.org/pdf/1804.02688v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-single-image-rain-removal-via-a-deep |
Repo | |
Framework | |
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Title | Multimodal Explanations: Justifying Decisions and Pointing to the Evidence |
Authors | Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach |
Abstract | Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets to define and evaluate this task, and propose a novel model which can provide joint textual rationale generation and attention visualization. Our datasets define visual and textual justifications of a classification decision for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). We quantitatively show that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision. We also qualitatively show cases where visual explanation is more insightful than textual explanation, and vice versa, supporting our thesis that multimodal explanation models offer significant benefits over unimodal approaches. |
Tasks | Activity Recognition, Question Answering, Visual Question Answering |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.08129v1 |
http://arxiv.org/pdf/1802.08129v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-explanations-justifying-decisions |
Repo | |
Framework | |
An Off-policy Policy Gradient Theorem Using Emphatic Weightings
Title | An Off-policy Policy Gradient Theorem Using Emphatic Weightings |
Authors | Ehsan Imani, Eric Graves, Martha White |
Abstract | Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient. In off-policy learning, however, where the behaviour policy is not necessarily attempting to learn and follow the optimal policy for the given task, the existence of such a theorem has been elusive. In this work, we solve this open problem by providing the first off-policy policy gradient theorem. The key to the derivation is the use of $emphatic$ $weightings$. We develop a new actor-critic algorithm$\unicode{x2014}$called Actor Critic with Emphatic weightings (ACE)$\unicode{x2014}$that approximates the simplified gradients provided by the theorem. We demonstrate in a simple counterexample that previous off-policy policy gradient methods$\unicode{x2014}$particularly OffPAC and DPG$\unicode{x2014}$converge to the wrong solution whereas ACE finds the optimal solution. |
Tasks | Policy Gradient Methods |
Published | 2018-11-22 |
URL | https://arxiv.org/abs/1811.09013v2 |
https://arxiv.org/pdf/1811.09013v2.pdf | |
PWC | https://paperswithcode.com/paper/an-off-policy-policy-gradient-theorem-using |
Repo | |
Framework | |
Image Super-Resolution via RL-CSC: When Residual Learning Meets Convolutional Sparse Coding
Title | Image Super-Resolution via RL-CSC: When Residual Learning Meets Convolutional Sparse Coding |
Authors | Menglei Zhang, Zhou Liu, Lei Yu |
Abstract | We propose a simple yet effective model for Single Image Super-Resolution (SISR), by combining the merits of Residual Learning and Convolutional Sparse Coding (RL-CSC). Our model is inspired by the Learned Iterative Shrinkage-Threshold Algorithm (LISTA). We extend LISTA to its convolutional version and build the main part of our model by strictly following the convolutional form, which improves the network’s interpretability. Specifically, the convolutional sparse codings of input feature maps are learned in a recursive manner, and high-frequency information can be recovered from these CSCs. More importantly, residual learning is applied to alleviate the training difficulty when the network goes deeper. Extensive experiments on benchmark datasets demonstrate the effectiveness of our method. RL-CSC (30 layers) outperforms several recent state-of-the-arts, e.g., DRRN (52 layers) and MemNet (80 layers) in both accuracy and visual qualities. Codes and more results are available at https://github.com/axzml/RL-CSC. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11950v1 |
http://arxiv.org/pdf/1812.11950v1.pdf | |
PWC | https://paperswithcode.com/paper/image-super-resolution-via-rl-csc-when |
Repo | |
Framework | |
Adversarial Scene Editing: Automatic Object Removal from Weak Supervision
Title | Adversarial Scene Editing: Automatic Object Removal from Weak Supervision |
Authors | Rakshith Shetty, Mario Fritz, Bernt Schiele |
Abstract | While great progress has been made recently in automatic image manipulation, it has been limited to object centric images like faces or structured scene datasets. In this work, we take a step towards general scene-level image editing by developing an automatic interaction-free object removal model. Our model learns to find and remove objects from general scene images using image-level labels and unpaired data in a generative adversarial network (GAN) framework. We achieve this with two key contributions: a two-stage editor architecture consisting of a mask generator and image in-painter that co-operate to remove objects, and a novel GAN based prior for the mask generator that allows us to flexibly incorporate knowledge about object shapes. We experimentally show on two datasets that our method effectively removes a wide variety of objects using weak supervision only |
Tasks | |
Published | 2018-06-05 |
URL | http://arxiv.org/abs/1806.01911v1 |
http://arxiv.org/pdf/1806.01911v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-scene-editing-automatic-object |
Repo | |
Framework | |
Policy Optimization with Model-based Explorations
Title | Policy Optimization with Model-based Explorations |
Authors | Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hualin He, Qing He, Pingzhong Tang |
Abstract | Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games. |
Tasks | Atari Games, Decision Making |
Published | 2018-11-18 |
URL | http://arxiv.org/abs/1811.07350v1 |
http://arxiv.org/pdf/1811.07350v1.pdf | |
PWC | https://paperswithcode.com/paper/policy-optimization-with-model-based |
Repo | |
Framework | |
Representation Learning and Recovery in the ReLU Model
Title | Representation Learning and Recovery in the ReLU Model |
Authors | Arya Mazumdar, Ankit Singh Rawat |
Abstract | Rectified linear units, or ReLUs, have become the preferred activation function for artificial neural networks. In this paper we consider two basic learning problems assuming that the underlying data follow a generative model based on a ReLU-network – a neural network with ReLU activations. As a primarily theoretical study, we limit ourselves to a single-layer network. The first problem we study corresponds to dictionary-learning in the presence of nonlinearity (modeled by the ReLU functions). Given a set of observation vectors $\mathbf{y}^i \in \mathbb{R}^d, i =1, 2, \dots , n$, we aim to recover $d\times k$ matrix $A$ and the latent vectors ${\mathbf{c}^i} \subset \mathbb{R}^k$ under the model $\mathbf{y}^i = \mathrm{ReLU}(A\mathbf{c}^i +\mathbf{b})$, where $\mathbf{b}\in \mathbb{R}^d$ is a random bias. We show that it is possible to recover the column space of $A$ within an error of $O(d)$ (in Frobenius norm) under certain conditions on the probability distribution of $\mathbf{b}$. The second problem we consider is that of robust recovery of the signal in the presence of outliers, i.e., large but sparse noise. In this setting we are interested in recovering the latent vector $\mathbf{c}$ from its noisy nonlinear sketches of the form $\mathbf{v} = \mathrm{ReLU}(A\mathbf{c}) + \mathbf{e}+\mathbf{w}$, where $\mathbf{e} \in \mathbb{R}^d$ denotes the outliers with sparsity $s$ and $\mathbf{w} \in \mathbb{R}^d$ denote the dense but small noise. This line of work has recently been studied (Soltanolkotabi, 2017) without the presence of outliers. For this problem, we show that a generalized LASSO algorithm is able to recover the signal $\mathbf{c} \in \mathbb{R}^k$ within an $\ell_2$ error of $O(\sqrt{\frac{(k+s)\log d}{d}})$ when $A$ is a random Gaussian matrix. |
Tasks | Dictionary Learning, Representation Learning |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04304v1 |
http://arxiv.org/pdf/1803.04304v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-and-recovery-in-the |
Repo | |
Framework | |
A Graph Transduction Game for Multi-target Tracking
Title | A Graph Transduction Game for Multi-target Tracking |
Authors | Tewodros Mulugeta Dagnew, Dalia Coppi, Marcello Pelillo, Rita Cucchiara |
Abstract | Semi-supervised learning is a popular class of techniques to learn from labeled and unlabeled data. The paper proposes an application of a recently proposed approach of graph transduction that exploits game theoretic notions to the problem of multiple people tracking. Within the proposed framework, targets are considered as players of a multi-player non-cooperative game. The equilibria of the game is considered as a consistent labeling solution and thus an estimation of the target association in the sequence of frames. Patches of persons are extracted from the video frames using a HOG based detector and their similarity is modeled using distances among their covariance matrices. The solution we propose achieves satisfactory results on video surveillance datasets. The experiments show the robustness of the method even with a heavy unbalance between the number of labeled and unlabeled input patches. |
Tasks | Multiple People Tracking |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.07227v2 |
http://arxiv.org/pdf/1806.07227v2.pdf | |
PWC | https://paperswithcode.com/paper/a-graph-transduction-game-for-multi-target |
Repo | |
Framework | |
Learning Equations for Extrapolation and Control
Title | Learning Equations for Extrapolation and Control |
Authors | Subham S. Sahoo, Christoph H. Lampert, Georg Martius |
Abstract | We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task. |
Tasks | Model Selection |
Published | 2018-06-19 |
URL | http://arxiv.org/abs/1806.07259v1 |
http://arxiv.org/pdf/1806.07259v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-equations-for-extrapolation-and |
Repo | |
Framework | |
“Read My Lips”: Using Automatic Text Analysis to Classify Politicians by Party and Ideology
Title | “Read My Lips”: Using Automatic Text Analysis to Classify Politicians by Party and Ideology |
Authors | Eitan Sapiro-Gheiler |
Abstract | The increasing digitization of political speech has opened the door to studying a new dimension of political behavior using text analysis. This work investigates the value of word-level statistical data from the US Congressional Record–which contains the full text of all speeches made in the US Congress–for studying the ideological positions and behavior of senators. Applying machine learning techniques, we use this data to automatically classify senators according to party, obtaining accuracy in the 70-95% range depending on the specific method used. We also show that using text to predict DW-NOMINATE scores, a common proxy for ideology, does not improve upon these already-successful results. This classification deteriorates when applied to text from sessions of Congress that are four or more years removed from the training set, pointing to a need on the part of voters to dynamically update the heuristics they use to evaluate party based on political speech. Text-based predictions are less accurate than those based on voting behavior, supporting the theory that roll-call votes represent greater commitment on the part of politicians and are thus a more accurate reflection of their ideological preferences. However, the overall success of the machine learning approaches studied here demonstrates that political speeches are highly predictive of partisan affiliation. In addition to these findings, this work also introduces the computational tools and methods relevant to the use of political speech data. |
Tasks | |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00741v1 |
http://arxiv.org/pdf/1809.00741v1.pdf | |
PWC | https://paperswithcode.com/paper/read-my-lips-using-automatic-text-analysis-to |
Repo | |
Framework | |
Prediction of New Onset Diabetes after Liver Transplant
Title | Prediction of New Onset Diabetes after Liver Transplant |
Authors | Angeline Yasodhara, Mamatha Bhat, Anna Goldenberg |
Abstract | 25% of people who received a liver transplant will go on to develop diabetes within the next 5 years. These thousands of individuals are at 2-fold higher risk of cardiovascular events, graft loss, infections, as well as lower long-term survival. This is partly due to the medication used during and/or after transplant that significantly impacts metabolic balance. To assess which medication best suits the patient’s condition, clinicians need an accurate estimate of diabetes risk. Both patient’s historical data and observations at the current visit are informative in predicting whether the patient will develop diabetes within the following year. In this work we compared a variety of time-to-event prediction models as well as classifiers predicting the likelihood of the event within a year from the current checkup. We are particularly interested in comparing two types of models: 1) standard time-to-event predictors where the historical measurements are merely concatenated, 2) incorporating Deep Markov Model to first obtain low-dimensional embedding of historical data and then using this embedding as an additional input into the model. We compared a variety of algorithms including standard and regularized Cox proportional-hazards model (CPH), mixed effect random forests, survival-forests and Weibull Time-To-Event Recurrent Neural Network (WTTE-RNN). The results show that although all methods’ performances varied from year to year and there was no clear winner across all the time points, regularized CPH model that used 1 to 3 years of historical visits data on average achieved a high, clinically relevant Concordance Index of .863. We thus recommend this model for further prospective clinical validation and hopefully, an eventual use in the clinic to improve clinicians’ ability to personalize post-operative care and reduce the incidence of new-onset diabetes post liver transplant. |
Tasks | Time-to-Event Prediction |
Published | 2018-12-03 |
URL | https://arxiv.org/abs/1812.00506v2 |
https://arxiv.org/pdf/1812.00506v2.pdf | |
PWC | https://paperswithcode.com/paper/prediction-of-new-onset-diabetes-after-liver |
Repo | |
Framework | |
Mean Field Network based Graph Refinement with application to Airway Tree Extraction
Title | Mean Field Network based Graph Refinement with application to Airway Tree Extraction |
Authors | Raghavendra Selvan, Max Welling, Jesper H. Pedersen, Jens Petersen, Marleen de Bruijne |
Abstract | We present tree extraction in 3D images as a graph refinement task, of obtaining a subgraph from an over-complete input graph. To this end, we formulate an approximate Bayesian inference framework on undirected graphs using mean field approximation (MFA). Mean field networks are used for inference based on the interpretation that iterations of MFA can be seen as feed-forward operations in a neural network. This allows us to learn the model parameters from training data using back-propagation algorithm. We demonstrate usefulness of the model to extract airway trees from 3D chest CT data. We first obtain probability images using a voxel classifier that distinguishes airways from background and use Bayesian smoothing to model individual airway branches. This yields us joint Gaussian density estimates of position, orientation and scale as node features of the input graph. Performance of the method is compared with two methods: the first uses probability images from a trained voxel classifier with region growing, which is similar to one of the best performing methods at EXACT’09 airway challenge, and the second method is based on Bayesian smoothing on these probability images. Using centerline distance as error measure the presented method shows significant improvement compared to these two methods. |
Tasks | Bayesian Inference |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03348v1 |
http://arxiv.org/pdf/1804.03348v1.pdf | |
PWC | https://paperswithcode.com/paper/mean-field-network-based-graph-refinement |
Repo | |
Framework | |
Selecting the Best in GANs Family: a Post Selection Inference Framework
Title | Selecting the Best in GANs Family: a Post Selection Inference Framework |
Authors | Yao-Hung Hubert Tsai, Makoto Yamada, Denny Wu, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu |
Abstract | “Which Generative Adversarial Networks (GANs) generates the most plausible images?” has been a frequently asked question among researchers. To address this problem, we first propose an \emph{incomplete} U-statistics estimate of maximum mean discrepancy $\mathrm{MMD}{inc}$ to measure the distribution discrepancy between generated and real images. $\mathrm{MMD}{inc}$ enjoys the advantages of asymptotic normality, computation efficiency, and model agnosticity. We then propose a GANs analysis framework to select and test the “best” member in GANs family using the Post Selection Inference (PSI) with $\mathrm{MMD}{inc}$. In the experiments, we adopt the proposed framework on 7 GANs variants and compare their $\mathrm{MMD}{inc}$ scores. |
Tasks | |
Published | 2018-02-15 |
URL | http://arxiv.org/abs/1802.05411v2 |
http://arxiv.org/pdf/1802.05411v2.pdf | |
PWC | https://paperswithcode.com/paper/selecting-the-best-in-gans-family-a-post |
Repo | |
Framework | |