Paper Group ANR 58
Hyperparameters Optimization in Deep Convolutional Neural Network / Bayesian Approach with Gaussian Process Prior. Solving the Resource Constrained Project Scheduling Problem Using the Parallel Tabu Search Designed for the CUDA Platform. Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods …
Hyperparameters Optimization in Deep Convolutional Neural Network / Bayesian Approach with Gaussian Process Prior
Title | Hyperparameters Optimization in Deep Convolutional Neural Network / Bayesian Approach with Gaussian Process Prior |
Authors | Pushparaja Murugan |
Abstract | Convolutional Neural Network is known as ConvNet have been extensively used in many complex machine learning tasks. However, hyperparameters optimization is one of a crucial step in developing ConvNet architectures, since the accuracy and performance are reliant on the hyperparameters. This multilayered architecture parameterized by a set of hyperparameters such as the number of convolutional layers, number of fully connected dense layers & neurons, the probability of dropout implementation, learning rate. Hence the searching the hyperparameter over the hyperparameter space are highly difficult to build such complex hierarchical architecture. Many methods have been proposed over the decade to explore the hyperparameter space and find the optimum set of hyperparameter values. Reportedly, Gird search and Random search are said to be inefficient and extremely expensive, due to a large number of hyperparameters of the architecture. Hence, Sequential model-based Bayesian Optimization is a promising alternative technique to address the extreme of the unknown cost function. The recent study on Bayesian Optimization by Snoek in nine convolutional network parameters is achieved the lowerest error report in the CIFAR-10 benchmark. This article is intended to provide the overview of the mathematical concept behind the Bayesian Optimization over a Gaussian prior. |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.07233v1 |
http://arxiv.org/pdf/1712.07233v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperparameters-optimization-in-deep |
Repo | |
Framework | |
Solving the Resource Constrained Project Scheduling Problem Using the Parallel Tabu Search Designed for the CUDA Platform
Title | Solving the Resource Constrained Project Scheduling Problem Using the Parallel Tabu Search Designed for the CUDA Platform |
Authors | Libor Bukata, Premysl Sucha, Zdenek Hanzalek |
Abstract | In the paper, a parallel Tabu Search algorithm for the Resource Constrained Project Scheduling Problem is proposed. To deal with this NP-hard combinatorial problem many optimizations have been performed. For example, a resource evaluation algorithm is selected by a heuristic and an effective Tabu List was designed. In addition to that, a capacity-indexed resource evaluation algorithm was proposed and the GPU (Graphics Processing Unit) version uses a homogeneous model to reduce the required communication bandwidth. According to the experiments, the GPU version outperforms the optimized parallel CPU version with respect to the computational time and the quality of solutions. In comparison with other existing heuristics, the proposed solution often gives better quality solutions. |
Tasks | |
Published | 2017-11-13 |
URL | http://arxiv.org/abs/1711.04556v1 |
http://arxiv.org/pdf/1711.04556v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-the-resource-constrained-project |
Repo | |
Framework | |
Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods for Predicting Drug-Induced Laboratory Test Trajectories
Title | Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods for Predicting Drug-Induced Laboratory Test Trajectories |
Authors | Alexandre Yahi, Rami Vanguri, Noémie Elhadad, Nicholas P. Tatonetti |
Abstract | Generative Adversarial Networks (GANs) represent a promising class of generative networks that combine neural networks with game theory. From generating realistic images and videos to assisting musical creation, GANs are transforming many fields of arts and sciences. However, their application to healthcare has not been fully realized, more specifically in generating electronic health records (EHR) data. In this paper, we propose a framework for exploring the value of GANs in the context of continuous laboratory time series data. We devise an unsupervised evaluation method that measures the predictive power of synthetic laboratory test time series. Further, we show that when it comes to predicting the impact of drug exposure on laboratory test data, incorporating representation learning of the training cohorts prior to training GAN models is beneficial. |
Tasks | Representation Learning, Time Series |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00164v1 |
http://arxiv.org/pdf/1712.00164v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-for-3 |
Repo | |
Framework | |
Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
Title | Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification |
Authors | Yongyu Wang, Zhuo Feng |
Abstract | The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of “spectrally critical” off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering. |
Tasks | |
Published | 2017-10-12 |
URL | http://arxiv.org/abs/1710.04584v4 |
http://arxiv.org/pdf/1710.04584v4.pdf | |
PWC | https://paperswithcode.com/paper/towards-scalable-spectral-clustering-via |
Repo | |
Framework | |
Intrinsic Grassmann Averages for Online Linear, Robust and Nonlinear Subspace Learning
Title | Intrinsic Grassmann Averages for Online Linear, Robust and Nonlinear Subspace Learning |
Authors | Rudrasis Chakraborty, Søren Hauberg, Baba C. Vemuri |
Abstract | Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional Reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principal linear subspaces in both situations as well as for the robust PCA case, that amounts to computing the intrinsic average on the space of all subspaces: the Grassmann manifold. Points on this manifold are defined as the subspaces spanned by $K$-tuples of observations. The intrinsic Grassmann average of these subspaces are shown to coincide with the principal components of the observations when they are drawn from a Gaussian distribution. We show similar results in the RKHS case and provide an efficient algorithm for computing the projection onto the this average subspace. The result is a method akin to KPCA which is substantially faster. Further, we present a novel online version of the KPCA using our geometric framework. Competitive performance of all our algorithms are demonstrated on a variety of real and synthetic data sets. |
Tasks | Dimensionality Reduction |
Published | 2017-02-03 |
URL | http://arxiv.org/abs/1702.01005v2 |
http://arxiv.org/pdf/1702.01005v2.pdf | |
PWC | https://paperswithcode.com/paper/intrinsic-grassmann-averages-for-online |
Repo | |
Framework | |
Generalized notions of sparsity and restricted isometry property. Part II: Applications
Title | Generalized notions of sparsity and restricted isometry property. Part II: Applications |
Authors | Marius Junge, Kiryung Lee |
Abstract | The restricted isometry property (RIP) is a universal tool for data recovery. We explore the implication of the RIP in the framework of generalized sparsity and group measurements introduced in the Part I paper. It turns out that for a given measurement instrument the number of measurements for RIP can be improved by optimizing over families of Banach spaces. Second, we investigate the preservation of difference of two sparse vectors, which is not trivial in generalized models. Third, we extend the RIP of partial Fourier measurements at optimal scaling of number of measurements with random sign to far more general group structured measurements. Lastly, we also obtain RIP in infinite dimension in the context of Fourier measurement concepts with sparsity naturally replaced by smoothness assumptions. |
Tasks | |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09411v2 |
http://arxiv.org/pdf/1706.09411v2.pdf | |
PWC | https://paperswithcode.com/paper/generalized-notions-of-sparsity-and |
Repo | |
Framework | |
A giant with feet of clay: on the validity of the data that feed machine learning in medicine
Title | A giant with feet of clay: on the validity of the data that feed machine learning in medicine |
Authors | Federico Cabitza, Davide Ciucci, Raffaele Rasoini |
Abstract | This paper considers the use of Machine Learning (ML) in medicine by focusing on the main problem that this computational approach has been aimed at solving or at least minimizing: uncertainty. To this aim, we point out how uncertainty is so ingrained in medicine that it biases also the representation of clinical phenomena, that is the very input of ML models, thus undermining the clinical significance of their output. Recognizing this can motivate both medical doctors, in taking more responsibility in the development and use of these decision aids, and the researchers, in pursuing different ways to assess the value of these systems. In so doing, both designers and users could take this intrinsic characteristic of medicine more seriously and consider alternative approaches that do not “sweep uncertainty under the rug” within an objectivist fiction, which everyone can come up by believing as true. |
Tasks | |
Published | 2017-06-21 |
URL | http://arxiv.org/abs/1706.06838v3 |
http://arxiv.org/pdf/1706.06838v3.pdf | |
PWC | https://paperswithcode.com/paper/a-giant-with-feet-of-clay-on-the-validity-of |
Repo | |
Framework | |
Hierarchical Deep Recurrent Architecture for Video Understanding
Title | Hierarchical Deep Recurrent Architecture for Video Understanding |
Authors | Luming Tang, Boyang Deng, Haiyu Zhao, Shuai Yi |
Abstract | This paper introduces the system we developed for the Youtube-8M Video Understanding Challenge, in which a large-scale benchmark dataset was used for multi-label video classification. The proposed framework contains hierarchical deep architecture, including the frame-level sequence modeling part and the video-level classification part. In the frame-level sequence modelling part, we explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM (HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of frames in a video. We also introduce two attention pooling methods, single attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we can pay more attention to the informative frames in a video and ignore the useless frames. In the video-level classification part, two methods are proposed to increase the classification performance, i.e. Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final submission is an ensemble consisting of 18 sub-models. In terms of the official evaluation metric Global Average Precision (GAP) at 20, our best submission achieves 0.84346 on the public 50% of test dataset and 0.84333 on the private 50% of test data. |
Tasks | Video Classification, Video Understanding |
Published | 2017-07-11 |
URL | http://arxiv.org/abs/1707.03296v1 |
http://arxiv.org/pdf/1707.03296v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-deep-recurrent-architecture-for |
Repo | |
Framework | |
Land Cover Classification via Multi-temporal Spatial Data by Recurrent Neural Networks
Title | Land Cover Classification via Multi-temporal Spatial Data by Recurrent Neural Networks |
Authors | Dino Ienco, Raffaele Gaetano, Claire Dupaquier, Pierre Maurel |
Abstract | Nowadays, modern earth observation programs produce huge volumes of satellite images time series (SITS) that can be useful to monitor geographical areas through time. How to efficiently analyze such kind of information is still an open question in the remote sensing field. Recently, deep learning methods proved suitable to deal with remote sensing data mainly for scene classification (i.e. Convolutional Neural Networks - CNNs - on single images) while only very few studies exist involving temporal deep learning approaches (i.e Recurrent Neural Networks - RNNs) to deal with remote sensing time series. In this letter we evaluate the ability of Recurrent Neural Networks, in particular the Long-Short Term Memory (LSTM) model, to perform land cover classification considering multi-temporal spatial data derived from a time series of satellite images. We carried out experiments on two different datasets considering both pixel-based and object-based classification. The obtained results show that Recurrent Neural Networks are competitive compared to state-of-the-art classifiers, and may outperform classical approaches in presence of low represented and/or highly mixed classes. We also show that using the alternative feature representation generated by LSTM can improve the performances of standard classifiers. |
Tasks | Scene Classification, Time Series |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04055v1 |
http://arxiv.org/pdf/1704.04055v1.pdf | |
PWC | https://paperswithcode.com/paper/land-cover-classification-via-multi-temporal |
Repo | |
Framework | |
Aggregating Frame-level Features for Large-Scale Video Classification
Title | Aggregating Frame-level Features for Large-Scale Video Classification |
Authors | Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang |
Abstract | This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset. We employ a large set of techniques to aggregate the provided frame-level feature representations and generate video-level predictions, including several variants of recurrent neural networks (RNN) and generalized VLAD. We also adopt several fusion strategies to explore the complementarity among the models. In terms of the official metric GAP@20 (global average precision at 20), our best fusion model attains 0.84198 on the public 50% of test data and 0.84193 on the private 50% of test data, ranking 4th out of 650 teams worldwide in the competition. |
Tasks | Multi-Label Classification, Video Classification, Video Understanding |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.00803v1 |
http://arxiv.org/pdf/1707.00803v1.pdf | |
PWC | https://paperswithcode.com/paper/aggregating-frame-level-features-for-large |
Repo | |
Framework | |
Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds
Title | Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds |
Authors | Dan Kushnir, Shirin Jalali, Iraj Saniee |
Abstract | Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. While state-of-the-art work on learning Gaussian mixture models has focused primarily on improving separation bounds and their generalization to arbitrary classes of mixture models, less emphasis has been paid to practical computational efficiency of the proposed solutions. In this paper, we propose a novel and highly efficient clustering algorithm for $n$ points drawn from a mixture of two arbitrary Gaussian distributions in $\mathbb{R}^p$. The algorithm involves performing random 1-dimensional projections until a direction is found that yields a user-specified clustering error $e$. For a 1-dimensional separation parameter $\gamma$ satisfying $\gamma=Q^{-1}(e)$, the expected number of such projections is shown to be bounded by $o(\ln p)$, when $\gamma$ satisfies $\gamma\leq c\sqrt{\ln{\ln{p}}}$, with $c$ as the separability parameter of the two Gaussians in $\mathbb{R}^p$. Consequently, the expected overall running time of the algorithm is linear in $n$ and quasi-linear in $p$ at $o(\ln{p})O(np)$, and the sample complexity is independent of $p$. This result stands in contrast to prior works which provide polynomial, with at-best quadratic, running time in $p$ and $n$. We show that our bound on the expected number of 1-dimensional projections extends to the case of three or more Gaussian components, and we present a generalization of our results to mixture distributions beyond the Gaussian model. |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.07242v3 |
http://arxiv.org/pdf/1712.07242v3.pdf | |
PWC | https://paperswithcode.com/paper/linear-time-clustering-for-high-dimensional |
Repo | |
Framework | |
Sparse Diffusion-Convolutional Neural Networks
Title | Sparse Diffusion-Convolutional Neural Networks |
Authors | James Atwood, Siddharth Pal, Don Towsley, Ananthram Swami |
Abstract | The predictive power and overall computational efficiency of Diffusion-convolutional neural networks make them an attractive choice for node classification tasks. However, a naive dense-tensor-based implementation of DCNNs leads to $\mathcal{O}(N^2)$ memory complexity which is prohibitive for large graphs. In this paper, we introduce a simple method for thresholding input graphs that provably reduces memory requirements of DCNNs to O(N) (i.e. linear in the number of nodes in the input) without significantly affecting predictive performance. |
Tasks | Node Classification |
Published | 2017-10-26 |
URL | http://arxiv.org/abs/1710.09813v1 |
http://arxiv.org/pdf/1710.09813v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-diffusion-convolutional-neural |
Repo | |
Framework | |
Neural Cross-Lingual Entity Linking
Title | Neural Cross-Lingual Entity Linking |
Authors | Avirup Sil, Gourab Kundu, Radu Florian, Wael Hamza |
Abstract | A major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts. The problem exacerbates with cross-lingual EL which involves linking mentions written in non-English documents to entries in the English Wikipedia: to compare textual clues across languages we need to compute similarity between textual fragments across languages. In this paper, we propose a neural EL model that trains fine-grained similarities and dissimilarities between the query and candidate document from multiple perspectives, combined with convolution and tensor networks. Further, we show that this English-trained system can be applied, in zero-shot learning, to other languages by making surprisingly effective use of multi-lingual embeddings. The proposed system has strong empirical evidence yielding state-of-the-art results in English as well as cross-lingual: Spanish and Chinese TAC 2015 datasets. |
Tasks | Cross-Lingual Entity Linking, Entity Linking, Tensor Networks, Zero-Shot Learning |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.01813v1 |
http://arxiv.org/pdf/1712.01813v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-cross-lingual-entity-linking |
Repo | |
Framework | |
On the Long-Term Memory of Deep Recurrent Networks
Title | On the Long-Term Memory of Deep Recurrent Networks |
Authors | Yoav Levine, Or Sharir, Alon Ziv, Amnon Shashua |
Abstract | A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network’s ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute which may be readily extended to other RNN architectures of interest, e.g. variants of LSTM networks. We obtain our results by considering a class of recurrent networks referred to as Recurrent Arithmetic Circuits, which merge the hidden state with the input via the Multiplicative Integration operation, and empirically demonstrate the discussed phenomena on common RNNs. Finally, we employ the tool of quantum Tensor Networks to gain additional graphic insight regarding the complexity brought forth by depth in recurrent networks. |
Tasks | Tensor Networks |
Published | 2017-10-25 |
URL | http://arxiv.org/abs/1710.09431v2 |
http://arxiv.org/pdf/1710.09431v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-long-term-memory-of-deep-recurrent |
Repo | |
Framework | |
An Inversion-Based Learning Approach for Improving Impromptu Trajectory Tracking of Robots with Non-Minimum Phase Dynamics
Title | An Inversion-Based Learning Approach for Improving Impromptu Trajectory Tracking of Robots with Non-Minimum Phase Dynamics |
Authors | Siqi Zhou, Mohamed K. Helwa, Angela P. Schoellig |
Abstract | This paper presents a learning-based approach for impromptu trajectory tracking for non-minimum phase systems, i.e., systems with unstable inverse dynamics. Inversion-based feedforward approaches are commonly used for improving tracking performance; however, these approaches are not directly applicable to non-minimum phase systems due to their inherent instability. In order to resolve the instability issue, existing methods have assumed that the system model is known and used pre-actuation or inverse approximation techniques. In this work, we propose an approach for learning a stable, approximate inverse of a non-minimum phase baseline system directly from its input-output data. Through theoretical discussions, simulations, and experiments on two different platforms, we show the stability of our proposed approach and its effectiveness for high-accuracy, impromptu tracking. Our approach also shows that including more information in the training, as is commonly assumed to be useful, does not lead to better performance but may trigger instability and impact the effectiveness of the overall approach. |
Tasks | |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04407v2 |
http://arxiv.org/pdf/1709.04407v2.pdf | |
PWC | https://paperswithcode.com/paper/an-inversion-based-learning-approach-for |
Repo | |
Framework | |