Paper Group ANR 994
Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces. nocaps: novel object captioning at scale. Learning and Interpreting Multi-Multi-Instance Learning Networks. Attend More Times for Image Captioning. Deconfounding age effects with fair representation learning when assessing dementia. BCSAT : A Benchmar …
Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces
Title | Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces |
Authors | Junhong Lin, Volkan Cevher |
Abstract | We investigate regularized algorithms combining with projection for least-squares regression problem over a Hilbert space, covering nonparametric regression over a reproducing kernel Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nystr"{o}m regularized algorithms. Our results are the first ones with optimal, distribution-dependent rates that do not have any saturation effect for sketched/Nystr"{o}m regularized algorithms, considering both the attainable and non-attainable cases. |
Tasks | |
Published | 2018-03-12 |
URL | http://arxiv.org/abs/1803.04371v2 |
http://arxiv.org/pdf/1803.04371v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-rates-of-sketched-regularized |
Repo | |
Framework | |
nocaps: novel object captioning at scale
Title | nocaps: novel object captioning at scale |
Authors | Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson |
Abstract | Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed ‘nocaps’, for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the OpenImages validation and test sets. The associated training data consists of COCO image-caption pairs, plus OpenImages image-level labels and object bounding boxes. Since OpenImages contains many more classes than COCO, nearly 400 object classes seen in test images have no or very few associated training captions (hence, nocaps). We extend existing novel object captioning models to establish strong baselines for this benchmark and provide analysis to guide future work on this task. |
Tasks | Image Captioning, Object Detection |
Published | 2018-12-20 |
URL | https://arxiv.org/abs/1812.08658v3 |
https://arxiv.org/pdf/1812.08658v3.pdf | |
PWC | https://paperswithcode.com/paper/nocaps-novel-object-captioning-at-scale |
Repo | |
Framework | |
Learning and Interpreting Multi-Multi-Instance Learning Networks
Title | Learning and Interpreting Multi-Multi-Instance Learning Networks |
Authors | Alessandro Tibo, Manfred Jaeger, Paolo Frasconi |
Abstract | We introduce an extension of the multi-instance learning problem where examples are organized as nested bags of instances (e.g., a document could be represented as a bag of sentences, which in turn are bags of words). This framework can be useful in various scenarios, such as text and image classification, but also supervised learning over graphs. As a further advantage, multi-multi instance learning enables a particular way of interpreting predictions and the decision function. Our approach is based on a special neural network layer, called bag-layer, whose units aggregate bags of inputs of arbitrary size. We prove theoretically that the associated class of functions contains all Boolean functions over sets of sets of instances and we provide empirical evidence that functions of this kind can be actually learned on semi-synthetic datasets. We finally present experiments on text classification, on citation graphs, and social graph data, which show that our model obtains competitive results with respect to accuracy when compared to other approaches such as convolutional networks on graphs, while at the same time it supports a general approach to interpret the learnt model, as well as explain individual predictions. |
Tasks | Image Classification, Text Classification |
Published | 2018-10-26 |
URL | http://arxiv.org/abs/1810.11514v2 |
http://arxiv.org/pdf/1810.11514v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-and-interpreting-multi-multi |
Repo | |
Framework | |
Attend More Times for Image Captioning
Title | Attend More Times for Image Captioning |
Authors | Jiajun Du, Yu Qin, Hongtao Lu, Yonghua Zhang |
Abstract | Most attention-based image captioning models attend to the image once per word. However, attending once per word is rigid and is easy to miss some information. Attending more times can adjust the attention position, find the missing information back and avoid generating the wrong word. In this paper, we show that attending more times per word can gain improvements in the image captioning task, without increasing the number of parameters. We propose a flexible two-LSTM merge model to make it convenient to encode more attentions than words. Our captioning model uses two LSTMs to encode the word sequence and the attention sequence respectively. The information of the two LSTMs and the image feature are combined to predict the next word. Experiments on the MSCOCO caption dataset show that our method outperforms the state-of-the-art. Using bottom up features and self-critical training method, our method gets BLEU-4, METEOR, ROUGE-L, CIDEr and SPICE scores of 0.381, 0.283, 0.580, 1.261 and 0.220 on the Karpathy test split. |
Tasks | Image Captioning |
Published | 2018-12-08 |
URL | http://arxiv.org/abs/1812.03283v2 |
http://arxiv.org/pdf/1812.03283v2.pdf | |
PWC | https://paperswithcode.com/paper/attend-more-times-for-image-captioning |
Repo | |
Framework | |
Deconfounding age effects with fair representation learning when assessing dementia
Title | Deconfounding age effects with fair representation learning when assessing dementia |
Authors | Zining Zhu, Jekaterina Novikova, Frank Rudzicz |
Abstract | One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers (especially deep neural network based models) to ignore. We show DNN models are capable of estimating ages based on linguistic features. Predicting dementia based on this aging bias could lead to potentially non-generalizable accuracies on clinical datasets, if not properly deconfounded. In this paper, we propose to address this deconfounding problem with fair representation learning. We build neural network classifiers that learn low-dimensional representations reflecting the impacts of dementia yet discarding the effects of age. To evaluate these classifiers, we specify a model-agnostic score $\Delta_{eo}^{(N)}$ measuring how classifier results are deconfounded from age. Our best models compromise accuracy by only 2.56% and 1.54% on two clinical datasets compared to DNNs, and their $\Delta_{eo}^{(2)}$ scores are better than statistical (residulization and inverse probability weight) adjustments. |
Tasks | Representation Learning |
Published | 2018-07-19 |
URL | https://arxiv.org/abs/1807.07217v4 |
https://arxiv.org/pdf/1807.07217v4.pdf | |
PWC | https://paperswithcode.com/paper/isolating-effects-of-age-with-fair |
Repo | |
Framework | |
BCSAT : A Benchmark Corpus for Sentiment Analysis in Telugu Using Word-level Annotations
Title | BCSAT : A Benchmark Corpus for Sentiment Analysis in Telugu Using Word-level Annotations |
Authors | Sreekavitha Parupalli, Vijjini Anvesh Rao, Radhika Mamidi |
Abstract | The presented work aims at generating a systematically annotated corpus that can support the enhancement of sentiment analysis tasks in Telugu using word-level sentiment annotations. From OntoSenseNet, we extracted 11,000 adjectives, 253 adverbs, 8483 verbs and sentiment annotation is being done by language experts. We discuss the methodology followed for the polarity annotations and validate the developed resource. This work aims at developing a benchmark corpus, as an extension to SentiWordNet, and baseline accuracy for a model where lexeme annotations are applied for sentiment predictions. The fundamental aim of this paper is to validate and study the possibility of utilizing machine learning algorithms, word-level sentiment annotations in the task of automated sentiment identification. Furthermore, accuracy is improved by annotating the bi-grams extracted from the target corpus. |
Tasks | Sentiment Analysis |
Published | 2018-07-04 |
URL | http://arxiv.org/abs/1807.01679v1 |
http://arxiv.org/pdf/1807.01679v1.pdf | |
PWC | https://paperswithcode.com/paper/bcsat-a-benchmark-corpus-for-sentiment |
Repo | |
Framework | |
Scalable Lévy Process Priors for Spectral Kernel Learning
Title | Scalable Lévy Process Priors for Spectral Kernel Learning |
Authors | Phillip A. Jang, Andrew E. Loeb, Matthew B. Davidow, Andrew Gordon Wilson |
Abstract | Gaussian processes are rich distributions over functions, with generalization properties determined by a kernel function. When used for long-range extrapolation, predictions are particularly sensitive to the choice of kernel parameters. It is therefore critical to account for kernel uncertainty in our predictive distributions. We propose a distribution over kernels formed by modelling a spectral mixture density with a L'evy process. The resulting distribution has support for all stationary covariances–including the popular RBF, periodic, and Mat'ern kernels–combined with inductive biases which enable automatic and data efficient learning, long-range extrapolation, and state of the art predictive performance. The proposed model also presents an approach to spectral regularization, as the L'evy process introduces a sparsity-inducing prior over mixture components, allowing automatic selection over model order and pruning of extraneous components. We exploit the algebraic structure of the proposed process for $\mathcal{O}(n)$ training and $\mathcal{O}(1)$ predictions. We perform extrapolations having reasonable uncertainty estimates on several benchmarks, show that the proposed model can recover flexible ground truth covariances and that it is robust to errors in initialization. |
Tasks | Gaussian Processes |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00530v1 |
http://arxiv.org/pdf/1802.00530v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-levy-process-priors-for-spectral |
Repo | |
Framework | |
New models for symbolic data analysis
Title | New models for symbolic data analysis |
Authors | Boris Beranger, Huan Lin, Scott A. Sisson |
Abstract | Symbolic data analysis (SDA) is an emerging area of statistics based on aggregating individual level data into group-based distributional summaries (symbols), and then developing statistical methods to analyse them. It is ideal for analysing large and complex datasets, and has immense potential to become a standard inferential technique in the near future. However, existing SDA techniques are either non-inferential, do not easily permit meaningful statistical models, are unable to distinguish between competing models, and are based on simplifying assumptions that are known to be false. Further, the procedure for constructing symbols from the underlying data is erroneously not considered relevant to the resulting statistical analysis. In this paper we introduce a new general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying classical data, while only observing the distributional summaries. This approach resolves many of the conceptual and practical issues with current SDA methods, opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. This work creates a new direction for SDA research, which we illustrate through several real and simulated data analyses. |
Tasks | |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.03659v1 |
http://arxiv.org/pdf/1809.03659v1.pdf | |
PWC | https://paperswithcode.com/paper/new-models-for-symbolic-data-analysis |
Repo | |
Framework | |
Deep Information Networks
Title | Deep Information Networks |
Authors | Giulio Franzese, Monica Visintin |
Abstract | We describe a novel classifier with a tree structure, designed using information theory concepts. This Information Network is made of information nodes, that compress the input data, and multiplexers, that connect two or more input nodes to an output node. Each information node is trained, independently of the others, to minimize a local cost function that minimizes the mutual information between its input and output with the constraint of keeping a given mutual information between its output and the target (information bottleneck). We show that the system is able to provide good results in terms of accuracy, while it shows many advantages in terms of modularity and reduced complexity. |
Tasks | |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02251v1 |
http://arxiv.org/pdf/1803.02251v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-information-networks |
Repo | |
Framework | |
Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families
Title | Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families |
Authors | Seong Jae Hwang, Ronak Mehta, Hyunwoo J. Kim, Vikas Singh |
Abstract | There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are enormous benefits to a system that is not only accurate but also has a sense for when it is not sure. Existing proposals center around Bayesian interpretations of modern deep architectures – these are effective but can often be computationally demanding. We show how classical ideas in the literature on exponential families on probabilistic networks provide an excellent starting point to derive uncertainty estimates in Gated Recurrent Units (GRU). Our proposal directly quantifies uncertainty deterministically, without the need for costly sampling-based estimation. We demonstrate how our model can be used to quantitatively and qualitatively measure uncertainty in unsupervised image sequence prediction. To our knowledge, this is the first result describing sampling-free uncertainty estimation for powerful sequential models such as GRUs. |
Tasks | |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07351v2 |
http://arxiv.org/pdf/1804.07351v2.pdf | |
PWC | https://paperswithcode.com/paper/sampling-free-uncertainty-estimation-in-gated |
Repo | |
Framework | |
Abstracting Causal Models
Title | Abstracting Causal Models |
Authors | Sander Beckers, Joseph Y. Halpern |
Abstract | We consider a sequence of successively more restrictive definitions of abstraction for causal models, starting with a notion introduced by Rubenstein et al. (2017) called exact transformation that applies to probabilistic causal models, moving to a notion of uniform transformation that applies to deterministic causal models and does not allow differences to be hidden by the “right” choice of distribution, and then to abstraction, where the interventions of interest are determined by the map from low-level states to high-level states, and strong abstraction, which takes more seriously all potential interventions in a model, not just the allowed interventions. We show that procedures for combining micro-variables into macro-variables are instances of our notion of strong abstraction, as are all the examples considered by Rubenstein et al. |
Tasks | |
Published | 2018-12-10 |
URL | https://arxiv.org/abs/1812.03789v4 |
https://arxiv.org/pdf/1812.03789v4.pdf | |
PWC | https://paperswithcode.com/paper/abstracting-causal-models |
Repo | |
Framework | |
Convolutional Invasion and Expansion Networks for Tumor Growth Prediction
Title | Convolutional Invasion and Expansion Networks for Tumor Growth Prediction |
Authors | Ling Zhang, Le Lu, Ronald M. Summers, Electron Kebebew, Jianhua Yao |
Abstract | Tumor growth is associated with cell invasion and mass-effect, which are traditionally formulated by mathematical models, namely reaction-diffusion equations and biomechanics. Such models can be personalized based on clinical measurements to build the predictive models for tumor growth. In this paper, we investigate the possibility of using deep convolutional neural networks (ConvNets) to directly represent and learn the cell invasion and mass-effect, and to predict the subsequent involvement regions of a tumor. The invasion network learns the cell invasion from information related to metabolic rate, cell density and tumor boundary derived from multimodal imaging data. The expansion network models the mass-effect from the growing motion of tumor mass. We also study different architectures that fuse the invasion and expansion networks, in order to exploit the inherent correlations among them. Our network can easily be trained on population data and personalized to a target patient, unlike most previous mathematical modeling methods that fail to incorporate population data. Quantitative experiments on a pancreatic tumor data set show that the proposed method substantially outperforms a state-of-the-art mathematical model-based approach in both accuracy and efficiency, and that the information captured by each of the two subnetworks are complementary. |
Tasks | |
Published | 2018-01-25 |
URL | http://arxiv.org/abs/1801.08468v1 |
http://arxiv.org/pdf/1801.08468v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-invasion-and-expansion-networks |
Repo | |
Framework | |
Enlarging Context with Low Cost: Efficient Arithmetic Coding with Trimmed Convolution
Title | Enlarging Context with Low Cost: Efficient Arithmetic Coding with Trimmed Convolution |
Authors | Mu Li, Shuhang Gu, David Zhang, Wangmeng Zuo |
Abstract | Arithmetic coding is an essential class of coding techniques. One key issue of arithmetic encoding method is to predict the probability of the current coding symbol from its context, i.e., the preceding encoded symbols, which usually can be executed by building a look-up table (LUT). However, the complexity of LUT increases exponentially with the length of context. Thus, such solutions are limited to modeling large context, which inevitably restricts the compression performance. Several recent deep neural network-based solutions have been developed to account for large context, but are still costly in computation. The inefficiency of the existing methods are mainly attributed to that probability prediction is performed independently for the neighboring symbols, which actually can be efficiently conducted by shared computation. To this end, we propose a trimmed convolutional network for arithmetic encoding (TCAE) to model large context while maintaining computational efficiency. As for trimmed convolution, the convolutional kernels are specially trimmed to respect the compression order and context dependency of the input symbols. Benefited from trimmed convolution, the probability prediction of all symbols can be efficiently performed in one single forward pass via a fully convolutional network. Furthermore, to speed up the decoding process, a slope TCAE model is presented to divide the codes from a 3D code map into several blocks and remove the dependency between the codes inner one block for parallel decoding, which can 60x speed up the decoding process. Experiments show that our TCAE and slope TCAE attain better compression ratio in lossless gray image compression, and can be adopted in CNN-based lossy image compression to achieve state-of-the-art rate-distortion performance with real-time encoding speed. |
Tasks | Image Compression |
Published | 2018-01-15 |
URL | http://arxiv.org/abs/1801.04662v2 |
http://arxiv.org/pdf/1801.04662v2.pdf | |
PWC | https://paperswithcode.com/paper/enlarging-context-with-low-cost-efficient |
Repo | |
Framework | |
Evaluating Reinforcement Learning Algorithms in Observational Health Settings
Title | Evaluating Reinforcement Learning Algorithms in Observational Health Settings |
Authors | Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, Finale Doshi-Velez |
Abstract | Much attention has been devoted recently to the development of machine learning algorithms with the goal of improving treatment policies in healthcare. Reinforcement learning (RL) is a sub-field within machine learning that is concerned with learning how to make sequences of decisions so as to optimize long-term effects. Already, RL algorithms have been proposed to identify decision-making strategies for mechanical ventilation, sepsis management and treatment of schizophrenia. However, before implementing treatment policies learned by black-box algorithms in high-stakes clinical decision problems, special care must be taken in the evaluation of these policies. In this document, our goal is to expose some of the subtleties associated with evaluating RL algorithms in healthcare. We aim to provide a conceptual starting point for clinical and computational researchers to ask the right questions when designing and evaluating algorithms for new ways of treating patients. In the following, we describe how choices about how to summarize a history, variance of statistical estimators, and confounders in more ad-hoc measures can result in unreliable, even misleading estimates of the quality of a treatment policy. We also provide suggestions for mitigating these effects—for while there is much promise for mining observational health data to uncover better treatment policies, evaluation must be performed thoughtfully. |
Tasks | Decision Making |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12298v1 |
http://arxiv.org/pdf/1805.12298v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-reinforcement-learning-algorithms |
Repo | |
Framework | |
V-FCNN: Volumetric Fully Convolution Neural Network For Automatic Atrial Segmentation
Title | V-FCNN: Volumetric Fully Convolution Neural Network For Automatic Atrial Segmentation |
Authors | Nicoló Savioli, Giovanni Montana, Pablo Lamata |
Abstract | Atrial Fibrillation (AF) is a common electro-physiological cardiac disorder that causes changes in the anatomy of the atria. A better characterization of these changes is desirable for the definition of clinical biomarkers, furthermore, thus there is a need for its fully automatic segmentation from clinical images. In this work, we present an architecture based on 3D-convolution kernels, a Volumetric Fully Convolution Neural Network (V-FCNN), able to segment the entire volume in a one-shot, and consequently integrate the implicit spatial redundancy present in high-resolution images. A loss function based on the mixture of both Mean Square Error (MSE) and Dice Loss (DL) is used, in an attempt to combine the ability to capture the bulk shape as well as the reduction of local errors products by over-segmentation. Results demonstrate a reasonable performance in the middle region of the atria along with the impact of the challenges of capturing the variability of the pulmonary veins or the identification of the valve plane that separates the atria to the ventricle. A final dice of $92.5%$ in $54$ patients ($4752$ atria test slices in total) is shown. |
Tasks | |
Published | 2018-08-06 |
URL | http://arxiv.org/abs/1808.01944v2 |
http://arxiv.org/pdf/1808.01944v2.pdf | |
PWC | https://paperswithcode.com/paper/v-fcnn-volumetric-fully-convolution-neural |
Repo | |
Framework | |