Paper Group ANR 768
Optimizing Human Learning. Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification. Fast Predictive Simple Geodesic Regression. Improving LBP and its variants using anisotropic diffusion. Visual Explanations from Hadamard Product in Multimodal Deep Networks. KNN Ensembles for Tweedie Regression: The Power of Multiscal …
Optimizing Human Learning
Title | Optimizing Human Learning |
Authors | Behzad Tabibian, Utkarsh Upadhyay, Abir De, Ali Zarezade, Bernhard Schoelkopf, Manuel Gomez-Rodriguez |
Abstract | Spaced repetition is a technique for efficient memorization which uses repeated, spaced review of content to improve long-term retention. Can we find the optimal reviewing schedule to maximize the benefits of spaced repetition? In this paper, we introduce a novel, flexible representation of spaced repetition using the framework of marked temporal point processes and then address the above question as an optimal control problem for stochastic differential equations with jumps. For two well-known human memory models, we show that the optimal reviewing schedule is given by the recall probability of the content to be learned. As a result, we can then develop a simple, scalable online algorithm, Memorize, to sample the optimal reviewing times. Experiments on both synthetic and real data gathered from Duolingo, a popular language-learning online platform, show that our algorithm may be able to help learners memorize more effectively than alternatives. |
Tasks | Point Processes |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01856v2 |
http://arxiv.org/pdf/1712.01856v2.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-human-learning |
Repo | |
Framework | |
Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification
Title | Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification |
Authors | Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, Xiangyang Xue, Shih-Fu Chang |
Abstract | Videos are inherently multimodal. This paper studies the problem of how to fully exploit the abundant multimodal clues for improved video categorization. We introduce a hybrid deep learning framework that integrates useful clues from multiple modalities, including static spatial appearance information, motion patterns within a short time window, audio information as well as long-range temporal dynamics. More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features. We then employ a feature fusion network to derive a unified representation with an aim to capture the relationships among features. Furthermore, to exploit the long-range temporal dynamics in videos, we apply two Long Short Term Memory networks with extracted appearance and motion features as inputs. Finally, we also propose to refine the prediction scores by leveraging contextual relationships among video semantics. The hybrid deep learning framework is able to exploit a comprehensive set of multimodal features for video classification. Through an extensive set of experiments, we demonstrate that (1) LSTM networks which model sequences in an explicitly recurrent manner are highly complementary with CNN models; (2) the feature fusion network which produces a fused representation through modeling feature relationships outperforms alternative fusion strategies; (3) the semantic context of video classes can help further refine the predictions for improved performance. Experimental results on two challenging benchmarks, the UCF-101 and the Columbia Consumer Videos (CCV), provide strong quantitative evidence that our framework achieves promising results: $93.1%$ on the UCF-101 and $84.5%$ on the CCV, outperforming competing methods with clear margins. |
Tasks | Video Classification |
Published | 2017-06-14 |
URL | http://arxiv.org/abs/1706.04508v1 |
http://arxiv.org/pdf/1706.04508v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-multimodal-clues-in-a-hybrid-deep |
Repo | |
Framework | |
Fast Predictive Simple Geodesic Regression
Title | Fast Predictive Simple Geodesic Regression |
Authors | Zhipeng Ding, Greg Fleishman, Xiao Yang, Paul Thompson, Roland Kwitt, Marc Niethammer |
Abstract | Deformable image registration and regression are important tasks in medical image analysis. However, they are computationally expensive, especially when analyzing large-scale datasets that contain thousands of images. Hence, cluster computing is typically used, making the approaches dependent on such computational infrastructure. Even larger computational resources are required as study sizes increase. This limits the use of deformable image registration and regression for clinical applications and as component algorithms for other image analysis approaches. We therefore propose using a fast predictive approach to perform image registrations. In particular, we employ these fast registration predictions to approximate a simplified geodesic regression model to capture longitudinal brain changes. The resulting method is orders of magnitude faster than the standard optimization-based regression model and hence facilitates large-scale analysis on a single graphics processing unit (GPU). We evaluate our results on 3D brain magnetic resonance images (MRI) from the ADNI datasets. |
Tasks | Image Registration |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05766v1 |
http://arxiv.org/pdf/1711.05766v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-predictive-simple-geodesic-regression |
Repo | |
Framework | |
Improving LBP and its variants using anisotropic diffusion
Title | Improving LBP and its variants using anisotropic diffusion |
Authors | Mariane B. Neiva, Patrick Guidotti, Odemir M. Bruno |
Abstract | The main purpose of this paper is to propose a new preprocessing step in order to improve local feature descriptors and texture classification. Preprocessing is implemented by using transformations which help highlight salient features that play a significant role in texture recognition. We evaluate and compare four different competing methods: three different anisotropic diffusion methods including the classical anisotropic Perona-Malik diffusion and two subsequent regularizations of it and the application of a Gaussian kernel, which is the classical multiscale approach in texture analysis. The combination of the transformed images and the original ones are analyzed. The results show that the use of the preprocessing step does lead to improved texture recognition. |
Tasks | Texture Classification |
Published | 2017-03-13 |
URL | http://arxiv.org/abs/1703.04418v1 |
http://arxiv.org/pdf/1703.04418v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-lbp-and-its-variants-using |
Repo | |
Framework | |
Visual Explanations from Hadamard Product in Multimodal Deep Networks
Title | Visual Explanations from Hadamard Product in Multimodal Deep Networks |
Authors | Jin-Hwa Kim, Byoung-Tak Zhang |
Abstract | The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs. In this work, we extend their work to show that the Hadamard product in multimodal deep networks performs not only for visual inputs but also for textual inputs simultaneously using the proposed gradient-based visualization technique. The attentional effect of Hadamard product is visualized for both visual and textual inputs by analyzing the two inputs and an output of the Hadamard product with the proposed method and compared with learned attentional weights of a visual question answering model. |
Tasks | Question Answering, Visual Question Answering |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06228v1 |
http://arxiv.org/pdf/1712.06228v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-explanations-from-hadamard-product-in |
Repo | |
Framework | |
KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods
Title | KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods |
Authors | Colleen M. Farrelly |
Abstract | Very few K-nearest-neighbor (KNN) ensembles exist, despite the efficacy of this approach in regression, classification, and outlier detection. Those that do exist focus on bagging features, rather than varying k or bagging observations; it is unknown whether varying k or bagging observations can improve prediction. Given recent studies from topological data analysis, varying k may function like multiscale topological methods, providing stability and better prediction, as well as increased ensemble diversity. This paper explores 7 KNN ensemble algorithms combining bagged features, bagged observations, and varied k to understand how each of these contribute to model fit. Specifically, these algorithms are tested on Tweedie regression problems through simulations and 6 real datasets; results are compared to state-of-the-art machine learning models including extreme learning machines, random forest, boosted regression, and Morse-Smale regression. Results on simulations suggest gains from varying k above and beyond bagging features or samples, as well as the robustness of KNN ensembles to the curse of dimensionality. KNN regression ensembles perform favorably against state-of-the-art algorithms and dramatically improve performance over KNN regression. Further, real dataset results suggest varying k is a good strategy in general (particularly for difficult Tweedie regression problems) and that KNN regression ensembles often outperform state-of-the-art methods. These results for k-varying ensembles echo recent theoretical results in topological data analysis, where multidimensional filter functions and multiscale coverings provide stability and performance gains over single-dimensional filters and single-scale covering. This opens up the possibility of leveraging multiscale neighborhoods and multiple measures of local geometry in ensemble methods. |
Tasks | Outlier Detection, Topological Data Analysis |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1708.02122v1 |
http://arxiv.org/pdf/1708.02122v1.pdf | |
PWC | https://paperswithcode.com/paper/knn-ensembles-for-tweedie-regression-the |
Repo | |
Framework | |
EEG Representation Using Multi-instance Framework on The Manifold of Symmetric Positive Definite Matrices for EEG-based Computer Aided Diagnosis
Title | EEG Representation Using Multi-instance Framework on The Manifold of Symmetric Positive Definite Matrices for EEG-based Computer Aided Diagnosis |
Authors | Khadijeh Sadatnejad, Saeed S. Ghidary, Reza Rostami, Reza Kazemi |
Abstract | The generalization and robustness of an electroencephalogram (EEG)-based computer aided diagnostic system are crucial requirements in actual clinical practice. To reach these goals, we propose a new EEG representation that provides a more realistic view of brain functionality by applying multi-instance (MI) framework to consider the non-stationarity of the EEG signal. The non-stationary characteristic of EEG is considered by describing the signal as a bag of relevant and irrelevant concepts. The concepts are provided by a robust representation of homogenous segments of EEG signal using spatial covariance matrices. Due to the nonlinear geometry of the space of covariance matrices, we determine the boundaries of the homogeneous segments based on adaptive segmentation of the signal in a Riemannian framework. Each subject is described as a bag of covariance matrices of homogenous segments and the bag-level discriminative information is used for classification. To evaluate the performance of the proposed approach, we examine it in attention deficit hyperactivity/bipolar mood disorder detection and depression/normal diagnosis applications. Experimental results confirm the superiority of the proposed approach, which is gained due to the robustness of covariance descriptor, the effectiveness of Riemannian geometry, and the benefits of considering the inherent non-stationary nature of the brain. |
Tasks | EEG |
Published | 2017-02-08 |
URL | http://arxiv.org/abs/1702.02655v1 |
http://arxiv.org/pdf/1702.02655v1.pdf | |
PWC | https://paperswithcode.com/paper/eeg-representation-using-multi-instance |
Repo | |
Framework | |
Variational Reflectance Estimation from Multi-view Images
Title | Variational Reflectance Estimation from Multi-view Images |
Authors | Jean Mélou, Yvain Quéau, Jean-Denis Durou, Fabien Castan, Daniel Cremers |
Abstract | We tackle the problem of reflectance estimation from a set of multi-view images, assuming known geometry. The approach we put forward turns the input images into reflectance maps, through a robust variational method. The variational model comprises an image-driven fidelity term and a term which enforces consistency of the reflectance estimates with respect to each view. If illumination is fixed across the views, then reflectance estimation remains under-constrained: a regularization term, which ensures piecewise-smoothness of the reflectance, is thus used. Reflectance is parameterized in the image domain, rather than on the surface, which makes the numerical solution much easier, by resorting to an alternating majorization-minimization approach. Experiments on both synthetic and real datasets are carried out to validate the proposed strategy. |
Tasks | |
Published | 2017-09-25 |
URL | http://arxiv.org/abs/1709.08378v2 |
http://arxiv.org/pdf/1709.08378v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-reflectance-estimation-from-multi |
Repo | |
Framework | |
Boosting with Structural Sparsity: A Differential Inclusion Approach
Title | Boosting with Structural Sparsity: A Differential Inclusion Approach |
Authors | Chendi Huang, Xinwei Sun, Jiechao Xiong, Yuan Yao |
Abstract | Boosting as gradient descent algorithms is one popular method in machine learning. In this paper a novel Boosting-type algorithm is proposed based on restricted gradient descent with structural sparsity control whose underlying dynamics are governed by differential inclusions. In particular, we present an iterative regularization path with structural sparsity where the parameter is sparse under some linear transforms, based on variable splitting and the Linearized Bregman Iteration. Hence it is called \emph{Split LBI}. Despite its simplicity, Split LBI outperforms the popular generalized Lasso in both theory and experiments. A theory of path consistency is presented that equipped with a proper early stopping, Split LBI may achieve model selection consistency under a family of Irrepresentable Conditions which can be weaker than the necessary and sufficient condition for generalized Lasso. Furthermore, some $\ell_2$ error bounds are also given at the minimax optimal rates. The utility and benefit of the algorithm are illustrated by several applications including image denoising, partial order ranking of sport teams, and world university grouping with crowdsourced ranking data. |
Tasks | Denoising, Image Denoising, Model Selection |
Published | 2017-04-16 |
URL | http://arxiv.org/abs/1704.04833v1 |
http://arxiv.org/pdf/1704.04833v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-with-structural-sparsity-a |
Repo | |
Framework | |
Modeling Epistemological Principles for Bias Mitigation in AI Systems: An Illustration in Hiring Decisions
Title | Modeling Epistemological Principles for Bias Mitigation in AI Systems: An Illustration in Hiring Decisions |
Authors | Marisa Vasconcelos, Carlos Cardonha, Bernardo Gonçalves |
Abstract | Artificial Intelligence (AI) has been used extensively in automatic decision making in a broad variety of scenarios, ranging from credit ratings for loans to recommendations of movies. Traditional design guidelines for AI models focus essentially on accuracy maximization, but recent work has shown that economically irrational and socially unacceptable scenarios of discrimination and unfairness are likely to arise unless these issues are explicitly addressed. This undesirable behavior has several possible sources, such as biased datasets used for training that may not be detected in black-box models. After pointing out connections between such bias of AI and the problem of induction, we focus on Popper’s contributions after Hume’s, which offer a logical theory of preferences. An AI model can be preferred over others on purely rational grounds after one or more attempts at refutation based on accuracy and fairness. Inspired by such epistemological principles, this paper proposes a structured approach to mitigate discrimination and unfairness caused by bias in AI systems. In the proposed computational framework, models are selected and enhanced after attempts at refutation. To illustrate our discussion, we focus on hiring decision scenarios where an AI system filters in which job applicants should go to the interview phase. |
Tasks | Decision Making |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07111v1 |
http://arxiv.org/pdf/1711.07111v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-epistemological-principles-for-bias |
Repo | |
Framework | |
DeepSolarEye: Power Loss Prediction and Weakly Supervised Soiling Localization via Fully Convolutional Networks for Solar Panels
Title | DeepSolarEye: Power Loss Prediction and Weakly Supervised Soiling Localization via Fully Convolutional Networks for Solar Panels |
Authors | Sachin Mehta, Amar P. Azad, Saneem A. Chemmengath, Vikas Raykar, Shivkumar Kalyanaraman |
Abstract | The impact of soiling on solar panels is an important and well-studied problem in renewable energy sector. In this paper, we present the first convolutional neural network (CNN) based approach for solar panel soiling and defect analysis. Our approach takes an RGB image of solar panel and environmental factors as inputs to predict power loss, soiling localization, and soiling type. In computer vision, localization is a complex task which typically requires manually labeled training data such as bounding boxes or segmentation masks. Our proposed approach consists of specialized four stages which completely avoids localization ground truth and only needs panel images with power loss labels for training. The region of impact area obtained from the predicted localization masks are classified into soiling types using the webly supervised learning. For improving localization capabilities of CNNs, we introduce a novel bi-directional input-aware fusion (BiDIAF) block that reinforces the input at different levels of CNN to learn input-specific feature maps. Our empirical study shows that BiDIAF improves the power loss prediction accuracy by about 3% and localization accuracy by about 4%. Our end-to-end model yields further improvement of about 24% on localization when learned in a weakly supervised manner. Our approach is generalizable and showed promising results on web crawled solar panel images. Our system has a frame rate of 22 fps (including all steps) on a NVIDIA TitanX GPU. Additionally, we collected first of it’s kind dataset for solar panel image analysis consisting 45,000+ images. |
Tasks | |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03811v2 |
http://arxiv.org/pdf/1710.03811v2.pdf | |
PWC | https://paperswithcode.com/paper/deepsolareye-power-loss-prediction-and-weakly |
Repo | |
Framework | |
Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers
Title | Less Is More: A Comprehensive Framework for the Number of Components of Ensemble Classifiers |
Authors | Hamed Bonab, Fazli Can |
Abstract | The number of component classifiers chosen for an ensemble greatly impacts the prediction ability. In this paper, we use a geometric framework for a priori determining the ensemble size, which is applicable to most of existing batch and online ensemble classifiers. There are only a limited number of studies on the ensemble size examining Majority Voting (MV) and Weighted Majority Voting (WMV). Almost all of them are designed for batch-mode, hardly addressing online environments. Big data dimensions and resource limitations, in terms of time and memory, make determination of ensemble size crucial, especially for online environments. For the MV aggregation rule, our framework proves that the more strong components we add to the ensemble, the more accurate predictions we can achieve. For the WMV aggregation rule, our framework proves the existence of an ideal number of components, which is equal to the number of class labels, with the premise that components are completely independent of each other and strong enough. While giving the exact definition for a strong and independent classifier in the context of an ensemble is a challenging task, our proposed geometric framework provides a theoretical explanation of diversity and its impact on the accuracy of predictions. We conduct a series of experimental evaluations to show the practical value of our theorems and existing challenges. |
Tasks | |
Published | 2017-09-09 |
URL | http://arxiv.org/abs/1709.02925v2 |
http://arxiv.org/pdf/1709.02925v2.pdf | |
PWC | https://paperswithcode.com/paper/less-is-more-a-comprehensive-framework-for |
Repo | |
Framework | |
SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays
Title | SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays |
Authors | Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing |
Abstract | Chest X-ray (CXR) is one of the most commonly prescribed medical imaging procedures, often with over 2-10x more scans than other imaging modalities such as MRI, CT scan, and PET scans. These voluminous CXR scans place significant workloads on radiologists and medical practitioners. Organ segmentation is a crucial step to obtain effective computer-aided detection on CXR. In this work, we propose Structure Correcting Adversarial Network (SCAN) to segment lung fields and the heart in CXR images. SCAN incorporates a critic network to impose on the convolutional segmentation network the structural regularities emerging from human physiology. During training, the critic network learns to discriminate between the ground truth organ annotations from the masks synthesized by the segmentation network. Through this adversarial process the critic network learns the higher order structures and guides the segmentation model to achieve realistic segmentation outcomes. Extensive experiments show that our method produces highly accurate and natural segmentation. Using only very limited training data available, our model reaches human-level performance without relying on any existing trained model or dataset. Our method also generalizes well to CXR images from a different patient population and disease profiles, surpassing the current state-of-the-art. |
Tasks | |
Published | 2017-03-26 |
URL | http://arxiv.org/abs/1703.08770v2 |
http://arxiv.org/pdf/1703.08770v2.pdf | |
PWC | https://paperswithcode.com/paper/scan-structure-correcting-adversarial-network |
Repo | |
Framework | |
Dependency Parsing with Dilated Iterated Graph CNNs
Title | Dependency Parsing with Dilated Iterated Graph CNNs |
Authors | Emma Strubell, Andrew McCallum |
Abstract | Dependency parses are an effective way to inject linguistic knowledge into many downstream tasks, and many practitioners wish to efficiently parse sentences at scale. Recent advances in GPU hardware have enabled neural networks to achieve significant gains over the previous best models, these models still fail to leverage GPUs’ capability for massive parallelism due to their requirement of sequential processing of the sentence. In response, we propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for graph-based dependency parsing, a graph convolutional architecture that allows for efficient end-to-end GPU parsing. In experiments on the English Penn TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best neural network parsers. |
Tasks | Dependency Parsing |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00403v2 |
http://arxiv.org/pdf/1705.00403v2.pdf | |
PWC | https://paperswithcode.com/paper/dependency-parsing-with-dilated-iterated |
Repo | |
Framework | |
A simple efficient density estimator that enables fast systematic search
Title | A simple efficient density estimator that enables fast systematic search |
Authors | Jonathan R. Wells, Kai Ming Ting |
Abstract | This paper introduces a simple and efficient density estimator that enables fast systematic search. To show its advantage over commonly used kernel density estimator, we apply it to outlying aspects mining. Outlying aspects mining discovers feature subsets (or subspaces) that describe how a query stand out from a given dataset. The task demands a systematic search of subspaces. We identify that existing outlying aspects miners are restricted to datasets with small data size and dimensions because they employ kernel density estimator, which is computationally expensive, for subspace assessments. We show that a recent outlying aspects miner can run orders of magnitude faster by simply replacing its density estimator with the proposed density estimator, enabling it to deal with large datasets with thousands of dimensions that would otherwise be impossible. |
Tasks | |
Published | 2017-07-03 |
URL | http://arxiv.org/abs/1707.00783v2 |
http://arxiv.org/pdf/1707.00783v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-efficient-density-estimator-that |
Repo | |
Framework | |