Paper Group ANR 831
3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds. Foreground Clustering for Joint Segmentation and Localization in Videos and Images. Countdown Regression: Sharp and Calibrated Survival Predictions. Representation Learning with Autoencoders for Electronic Health Records: A Comparative Study. Bayesian estimation for large scale …
3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds
Title | 3D Pick & Mix: Object Part Blending in Joint Shape and Image Manifolds |
Authors | Adrian Penate-Sanchez, Lourdes Agapito |
Abstract | We present 3D Pick & Mix, a new 3D shape retrieval system that provides users with a new level of freedom to explore 3D shape and Internet image collections by introducing the ability to reason about objects at the level of their constituent parts. While classic retrieval systems can only formulate simple searches such as “find the 3D model that is most similar to the input image” our new approach can formulate advanced and semantically meaningful search queries such as: “find me the 3D model that best combines the design of the legs of the chair in image 1 but with no armrests, like the chair in image 2”. Many applications could benefit from such rich queries, users could browse through catalogues of furniture and pick and mix parts, combining for example the legs of a chair from one shop and the armrests from another shop. |
Tasks | 3D Shape Retrieval |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.01068v1 |
http://arxiv.org/pdf/1811.01068v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-pick-mix-object-part-blending-in-joint |
Repo | |
Framework | |
Foreground Clustering for Joint Segmentation and Localization in Videos and Images
Title | Foreground Clustering for Joint Segmentation and Localization in Videos and Images |
Authors | Abhishek Sharma |
Abstract | This paper presents a novel framework in which video/image segmentation and localization are cast into a single optimization problem that integrates information from low level appearance cues with that of high level localization cues in a very weakly supervised manner. The proposed framework leverages two representations at different levels, exploits the spatial relationship between bounding boxes and superpixels as linear constraints and simultaneously discriminates between foreground and background at bounding box and superpixel level. Different from previous approaches that mainly rely on discriminative clustering, we incorporate a foreground model that minimizes the histogram difference of an object across all image frames. Exploiting the geometric relation between the superpixels and bounding boxes enables the transfer of segmentation cues to improve localization output and vice-versa. Inclusion of the foreground model generalizes our discriminative framework to video data where the background tends to be similar and thus, not discriminative. We demonstrate the effectiveness of our unified framework on the YouTube Object video dataset, Internet Object Discovery dataset and Pascal VOC 2007. |
Tasks | Semantic Segmentation |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10121v1 |
http://arxiv.org/pdf/1811.10121v1.pdf | |
PWC | https://paperswithcode.com/paper/foreground-clustering-for-joint-segmentation |
Repo | |
Framework | |
Countdown Regression: Sharp and Calibrated Survival Predictions
Title | Countdown Regression: Sharp and Calibrated Survival Predictions |
Authors | Anand Avati, Tony Duan, Sharon Zhou, Kenneth Jung, Nigam H. Shah, Andrew Ng |
Abstract | Probabilistic survival predictions from models trained with Maximum Likelihood Estimation (MLE) can have high, and sometimes unacceptably high variance. The field of meteorology, where the paradigm of maximizing sharpness subject to calibration is popular, has addressed this problem by using scoring rules beyond MLE, such as the Continuous Ranked Probability Score (CRPS). In this paper we present the \emph{Survival-CRPS}, a generalization of the CRPS to the survival prediction setting, with right-censored and interval-censored variants. We evaluate our ideas on the mortality prediction task using two different Electronic Health Record (EHR) data sets (STARR and MIMIC-III) covering millions of patients, with suitable deep neural network architectures: a Recurrent Neural Network (RNN) for STARR and a Fully Connected Network (FCN) for MIMIC-III. We compare results between the two scoring rules while keeping the network architecture and data fixed, and show that models trained with Survival-CRPS result in sharper predictive distributions compared to those trained by MLE, while still maintaining calibration. |
Tasks | Calibration, Decision Making, Mortality Prediction |
Published | 2018-06-21 |
URL | https://arxiv.org/abs/1806.08324v2 |
https://arxiv.org/pdf/1806.08324v2.pdf | |
PWC | https://paperswithcode.com/paper/countdown-regression-sharp-and-calibrated |
Repo | |
Framework | |
Representation Learning with Autoencoders for Electronic Health Records: A Comparative Study
Title | Representation Learning with Autoencoders for Electronic Health Records: A Comparative Study |
Authors | Najibesadat Sadati, Milad Zafar Nezhad, Ratna Babu Chinnam, Dongxiao Zhu |
Abstract | Increasing volume of Electronic Health Records (EHR) in recent years provides great opportunities for data scientists to collaborate on different aspects of healthcare research by applying advanced analytics to these EHR clinical data. A key requirement however is obtaining meaningful insights from high dimensional, sparse and complex clinical data. Data science approaches typically address this challenge by performing feature learning in order to build more reliable and informative feature representations from clinical data followed by supervised learning. In this paper, we propose a predictive modeling approach based on deep learning based feature representations and word embedding techniques. Our method uses different deep architectures (stacked sparse autoencoders, deep belief network, adversarial autoencoders and variational autoencoders) for feature representation in higher-level abstraction to obtain effective and robust features from EHRs, and then build prediction models on top of them. Our approach is particularly useful when the unlabeled data is abundant whereas labeled data is scarce. We investigate the performance of representation learning through a supervised learning approach. Our focus is to present a comparative study to evaluate the performance of different deep architectures through supervised learning and provide insights in the choice of deep feature representation techniques. Our experiments demonstrate that for small data sets, stacked sparse autoencoder demonstrates a superior generality performance in prediction due to sparsity regularization whereas variational autoencoders outperform the competing approaches for large data sets due to its capability of learning the representation distribution. |
Tasks | Representation Learning |
Published | 2018-01-06 |
URL | https://arxiv.org/abs/1801.02961v2 |
https://arxiv.org/pdf/1801.02961v2.pdf | |
PWC | https://paperswithcode.com/paper/a-predictive-approach-using-deep-feature |
Repo | |
Framework | |
Bayesian estimation for large scale multivariate Ornstein-Uhlenbeck model of brain connectivity
Title | Bayesian estimation for large scale multivariate Ornstein-Uhlenbeck model of brain connectivity |
Authors | Andrea Insabato, John P. Cunningham, Matthieu Gilson |
Abstract | Estimation of reliable whole-brain connectivity is a crucial step towards the use of connectivity information in quantitative approaches to the study of neuropsychiatric disorders. When estimating brain connectivity a challenge is imposed by the paucity of time samples and the large dimensionality of the measurements. Bayesian estimation methods for network models offer a number of advantages in this context but are not commonly employed. Here we compare three different estimation methods for the multivariate Ornstein-Uhlenbeck model, that has recently gained some popularity for characterizing whole-brain connectivity. We first show that a Bayesian estimation of model parameters assuming uniform priors is equivalent to an application of the method of moments. Then, using synthetic data, we show that the Bayesian estimate scales poorly with number of nodes in the network as compared to an iterative Lyapunov optimization. In particular when the network size is in the order of that used for whole-brain studies (about 100 nodes) the Bayesian method needs about eight times more time samples than Lyapunov method in order to achieve similar estimation accuracy. We also show that the higher estimation accuracy of Lyapunov method is reflected in a much better classification of individuals based on the estimated connectivity from a real dataset of BOLD fMRI. Finally we show that the poor accuracy of Bayesian method is due to numerical errors, when the imaginary part of the connectivity estimate gets large compared to its real part. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10050v1 |
http://arxiv.org/pdf/1805.10050v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-estimation-for-large-scale |
Repo | |
Framework | |
Evaluating historical text normalization systems: How well do they generalize?
Title | Evaluating historical text normalization systems: How well do they generalize? |
Authors | Alexander Robertson, Sharon Goldwater |
Abstract | We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice—i.e., for new datasets or languages; in comparison to more na"ive systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural models against a na"ive baseline system. We show that the neural models generalize well to unseen words in tests on five languages; nevertheless, they provide no clear benefit over the na"ive baseline for downstream POS tagging of an English historical collection. We conclude that future work should include more rigorous evaluation, including both intrinsic and extrinsic measures where possible. |
Tasks | |
Published | 2018-04-07 |
URL | http://arxiv.org/abs/1804.02545v2 |
http://arxiv.org/pdf/1804.02545v2.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-historical-text-normalization |
Repo | |
Framework | |
Crowd-Labeling Fashion Reviews with Quality Control
Title | Crowd-Labeling Fashion Reviews with Quality Control |
Authors | Iurii Chernushenko, Felix A. Gers, Alexander Löser, Alessandro Checco |
Abstract | We present a new methodology for high-quality labeling in the fashion domain with crowd workers instead of experts. We focus on the Aspect-Based Sentiment Analysis task. Our methods filter out inaccurate input from crowd workers but we preserve different worker labeling to capture the inherent high variability of the opinions. We demonstrate the quality of labeled data based on Facebook’s FastText framework as a baseline. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1805.09648v1 |
http://arxiv.org/pdf/1805.09648v1.pdf | |
PWC | https://paperswithcode.com/paper/crowd-labeling-fashion-reviews-with-quality |
Repo | |
Framework | |
Short-segment heart sound classification using an ensemble of deep convolutional neural networks
Title | Short-segment heart sound classification using an ensemble of deep convolutional neural networks |
Authors | Fuad Noman, Chee-Ming Ting, Sh-Hussain Salleh, Hernando Ombao |
Abstract | This paper proposes a framework based on deep convolutional neural networks (CNNs) for automatic heart sound classification using short-segments of individual heart beats. We design a 1D-CNN that directly learns features from raw heart-sound signals, and a 2D-CNN that takes inputs of two- dimensional time-frequency feature maps based on Mel-frequency cepstral coefficients (MFCC). We further develop a time-frequency CNN ensemble (TF-ECNN) combining the 1D-CNN and 2D-CNN based on score-level fusion of the class probabilities. On the large PhysioNet CinC challenge 2016 database, the proposed CNN models outperformed traditional classifiers based on support vector machine and hidden Markov models with various hand-crafted time- and frequency-domain features. Best classification scores with 89.22% accuracy and 89.94% sensitivity were achieved by the ECNN, and 91.55% specificity and 88.82% modified accuracy by the 2D-CNN alone on the test set. |
Tasks | |
Published | 2018-10-27 |
URL | http://arxiv.org/abs/1810.11573v1 |
http://arxiv.org/pdf/1810.11573v1.pdf | |
PWC | https://paperswithcode.com/paper/short-segment-heart-sound-classification |
Repo | |
Framework | |
Learning Front-end Filter-bank Parameters using Convolutional Neural Networks for Abnormal Heart Sound Detection
Title | Learning Front-end Filter-bank Parameters using Convolutional Neural Networks for Abnormal Heart Sound Detection |
Authors | Ahmed Imtiaz Humayun, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan |
Abstract | Automatic heart sound abnormality detection can play a vital role in the early diagnosis of heart diseases, particularly in low-resource settings. The state-of-the-art algorithms for this task utilize a set of Finite Impulse Response (FIR) band-pass filters as a front-end followed by a Convolutional Neural Network (CNN) model. In this work, we propound a novel CNN architecture that integrates the front-end bandpass filters within the network using time-convolution (tConv) layers, which enables the FIR filter-bank parameters to become learnable. Different initialization strategies for the learnable filters, including random parameters and a set of predefined FIR filter-bank coefficients, are examined. Using the proposed tConv layers, we add constraints to the learnable FIR filters to ensure linear and zero phase responses. Experimental evaluations are performed on a balanced 4-fold cross-validation task prepared using the PhysioNet/CinC 2016 dataset. Results demonstrate that the proposed models yield superior performance compared to the state-of-the-art system, while the linear phase FIR filterbank method provides an absolute improvement of 9.54% over the baseline in terms of an overall accuracy metric. |
Tasks | Anomaly Detection |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.05892v1 |
http://arxiv.org/pdf/1806.05892v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-front-end-filter-bank-parameters |
Repo | |
Framework | |
A kernel-based approach to molecular conformation analysis
Title | A kernel-based approach to molecular conformation analysis |
Authors | Stefan Klus, Andreas Bittracher, Ingmar Schuster, Christof Schütte |
Abstract | We present a novel machine learning approach to understanding conformation dynamics of biomolecules. The approach combines kernel-based techniques that are popular in the machine learning community with transfer operator theory for analyzing dynamical systems in order to identify conformation dynamics based on molecular dynamics simulation data. We show that many of the prominent methods like Markov State Models, EDMD, and TICA can be regarded as special cases of this approach and that new efficient algorithms can be constructed based on this derivation. The results of these new powerful methods will be illustrated with several examples, in particular the alanine dipeptide and the protein NTL9. |
Tasks | |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11092v2 |
http://arxiv.org/pdf/1809.11092v2.pdf | |
PWC | https://paperswithcode.com/paper/a-kernel-based-approach-to-molecular |
Repo | |
Framework | |
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Title | An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification |
Authors | Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan |
Abstract | In this work, we propose an ensemble of classifiers to distinguish between various degrees of abnormalities of the heart using Phonocardiogram (PCG) signals acquired using digital stethoscopes in a clinical setting, for the INTERSPEECH 2018 Computational Paralinguistics (ComParE) Heart Beats SubChallenge. Our primary classification framework constitutes a convolutional neural network with 1D-CNN time-convolution (tConv) layers, which uses features transferred from a model trained on the 2016 Physionet Heart Sound Database. We also employ a Representation Learning (RL) approach to generate features in an unsupervised manner using Deep Recurrent Autoencoders and use Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) classifiers. Finally, we utilize an SVM classifier on a high-dimensional segment-level feature extracted using various functionals on short-term acoustic features, i.e., Low-Level Descriptors (LLD). An ensemble of the three different approaches provides a relative improvement of 11.13% compared to our best single sub-system in terms of the Unweighted Average Recall (UAR) performance metric on the evaluation dataset. |
Tasks | Representation Learning |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06506v2 |
http://arxiv.org/pdf/1806.06506v2.pdf | |
PWC | https://paperswithcode.com/paper/an-ensemble-of-transfer-semi-supervised-and |
Repo | |
Framework | |
Event-triggered Learning for Resource-efficient Networked Control
Title | Event-triggered Learning for Resource-efficient Networked Control |
Authors | Friedrich Solowjow, Dominik Baumann, Jochen Garcke, Sebastian Trimpe |
Abstract | Common event-triggered state estimation (ETSE) algorithms save communication in networked control systems by predicting agents’ behavior, and transmitting updates only when the predictions deviate significantly. The effectiveness in reducing communication thus heavily depends on the quality of the dynamics models used to predict the agents’ states or measurements. Event-triggered learning is proposed herein as a novel concept to further reduce communication: whenever poor communication performance is detected, an identification experiment is triggered and an improved prediction model learned from data. Effective learning triggers are obtained by comparing the actual communication rate with the one that is expected based on the current model. By analyzing statistical properties of the inter-communication times and leveraging powerful convergence results, the proposed trigger is proven to limit learning experiments to the necessary instants. Numerical and physical experiments demonstrate that event-triggered learning improves robustness toward changing environments and yields lower communication rates than common ETSE. |
Tasks | |
Published | 2018-03-05 |
URL | http://arxiv.org/abs/1803.01802v2 |
http://arxiv.org/pdf/1803.01802v2.pdf | |
PWC | https://paperswithcode.com/paper/event-triggered-learning-for-resource |
Repo | |
Framework | |
Sarcasm Analysis using Conversation Context
Title | Sarcasm Analysis using Conversation Context |
Authors | Debanjan Ghosh, Alexander R. Fabbri, Smaranda Muresan |
Abstract | Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, the speaker’s sarcastic intent is not always apparent without additional context. Focusing on social media discussions, we investigate three issues: (1) does modeling conversation context help in sarcasm detection; (2) can we identify what part of conversation context triggered the sarcastic reply; and (3) given a sarcastic post that contains multiple sentences, can we identify the specific sentence that is sarcastic. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the current turn. We show that LSTM networks with sentence-level attention on context and current turn, as well as the conditional LSTM network (Rocktaschel et al. 2016), outperform the LSTM model that reads only the current turn. As conversation context, we consider the prior turn, the succeeding turn or both. Our computational models are tested on two types of social media platforms: Twitter and discussion forums. We discuss several differences between these datasets ranging from their size to the nature of the gold-label annotations. To address the last two issues, we present a qualitative analysis of attention weights produced by the LSTM models (with attention) and discuss the results compared with human performance on the two tasks. |
Tasks | Sarcasm Detection |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07531v2 |
http://arxiv.org/pdf/1808.07531v2.pdf | |
PWC | https://paperswithcode.com/paper/sarcasm-analysis-using-conversation-context |
Repo | |
Framework | |
The meaning of “most” for visual question answering models
Title | The meaning of “most” for visual question answering models |
Authors | Alexander Kuhnle, Ann Copestake |
Abstract | The correct interpretation of quantifier statements in the context of a visual scene requires non-trivial inference mechanisms. For the example of “most”, we discuss two strategies which rely on fundamentally different cognitive concepts. Our aim is to identify what strategy deep learning models for visual question answering learn when trained on such questions. To this end, we carefully design data to replicate experiments from psycholinguistics where the same question was investigated for humans. Focusing on the FiLM visual question answering model, our experiments indicate that a form of approximate number system emerges whose performance declines with more difficult scenes as predicted by Weber’s law. Moreover, we identify confounding factors, like spatial arrangement of the scene, which impede the effectiveness of this system. |
Tasks | Question Answering, Visual Question Answering |
Published | 2018-12-31 |
URL | https://arxiv.org/abs/1812.11737v2 |
https://arxiv.org/pdf/1812.11737v2.pdf | |
PWC | https://paperswithcode.com/paper/the-meaning-of-most-for-visual-question |
Repo | |
Framework | |
Composable Planning with Attributes
Title | Composable Planning with Attributes |
Authors | Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam |
Abstract | The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with a set of user defined attributes that parameterize the features of interest. We propose a method that learns a policy for transitioning between “nearby” sets of attributes, and maintains a graph of possible transitions. Given a task at test time that can be expressed in terms of a target set of attributes, and a current state, our model infers the attributes of the current state and searches over paths through attribute space to get a high level plan, and then uses its low level policy to execute the plan. We show in 3D block stacking, grid-world games, and StarCraft that our model is able to generalize to longer, more complex tasks at test time by composing simpler learned policies. |
Tasks | Starcraft |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00512v2 |
http://arxiv.org/pdf/1803.00512v2.pdf | |
PWC | https://paperswithcode.com/paper/composable-planning-with-attributes |
Repo | |
Framework | |