Paper Group ANR 321
Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning. SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks. Data-driven Localization and Estimation of Disturbance in the Interconnected Power System. Adaptivity for Regularized Kernel Methods by Lepskii’s Principle. CaloriNe …
Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning
Title | Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning |
Authors | Shunsuke Kitada, Hitoshi Iyatomi |
Abstract | In this report, we introduce the outline of our system in Task 3: Disease Classification of ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection. We fine-tuned multiple pre-trained neural network models based on Squeeze-and-Excitation Networks (SENet) which achieved state-of-the-art results in the field of image recognition. In addition, we used the mean teachers as a semi-supervised learning framework and introduced some specially designed data augmentation strategies for skin lesion analysis. We confirmed our data augmentation strategy improved classification performance and demonstrated 87.2% in balanced accuracy on the official ISIC2018 validation dataset. |
Tasks | Data Augmentation, Skin Lesion Classification |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02568v1 |
http://arxiv.org/pdf/1809.02568v1.pdf | |
PWC | https://paperswithcode.com/paper/skin-lesion-classification-with-ensemble-of |
Repo | |
Framework | |
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
Title | SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks |
Authors | Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska |
Abstract | Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has $10^4$ basic network layers on a 12GB K40c. |
Tasks | |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.04380v1 |
http://arxiv.org/pdf/1801.04380v1.pdf | |
PWC | https://paperswithcode.com/paper/superneurons-dynamic-gpu-memory-management |
Repo | |
Framework | |
Data-driven Localization and Estimation of Disturbance in the Interconnected Power System
Title | Data-driven Localization and Estimation of Disturbance in the Interconnected Power System |
Authors | Hyang-Won Lee, Jianan Zhang, Eytan Modiano |
Abstract | Identifying the location of a disturbance and its magnitude is an important component for stable operation of power systems. We study the problem of localizing and estimating a disturbance in the interconnected power system. We take a model-free approach to this problem by using frequency data from generators. Specifically, we develop a logistic regression based method for localization and a linear regression based method for estimation of the magnitude of disturbance. Our model-free approach does not require the knowledge of system parameters such as inertia constants and topology, and is shown to achieve highly accurate localization and estimation performance even in the presence of measurement noise and missing data. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01318v1 |
http://arxiv.org/pdf/1806.01318v1.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-localization-and-estimation-of |
Repo | |
Framework | |
Adaptivity for Regularized Kernel Methods by Lepskii’s Principle
Title | Adaptivity for Regularized Kernel Methods by Lepskii’s Principle |
Authors | Nicole Mücke |
Abstract | We address the problem of {\it adaptivity} in the framework of reproducing kernel Hilbert space (RKHS) regression. More precisely, we analyze estimators arising from a linear regularization scheme $g_\lam$. In practical applications, an important task is to choose the regularization parameter $\lam$ appropriately, i.e. based only on the given data and independently on unknown structural assumptions on the regression function. An attractive approach avoiding data-splitting is the {\it Lepskii Principle} (LP), also known as the {\it Balancing Principle} is this setting. We show that a modified parameter choice based on (LP) is minimax optimal adaptive, up to $\log\log(n)$. A convenient result is the fact that balancing in $L^2(\nu)-$ norm, which is easiest, automatically gives optimal balancing in all stronger norms, interpolating between $L^2(\nu)$ and the RKHS. An analogous result is open for other classical approaches to data dependent choices of the regularization parameter, e.g. for Hold-Out. |
Tasks | |
Published | 2018-04-15 |
URL | http://arxiv.org/abs/1804.05433v1 |
http://arxiv.org/pdf/1804.05433v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptivity-for-regularized-kernel-methods-by |
Repo | |
Framework | |
CaloriNet: From silhouettes to calorie estimation in private environments
Title | CaloriNet: From silhouettes to calorie estimation in private environments |
Authors | Alessandro Masullo, Tilo Burghardt, Dima Damen, Sion Hannuna, Victor Ponce-López, Majid Mirmehdi |
Abstract | We propose a novel deep fusion architecture, CaloriNet, for the online estimation of energy expenditure for free living monitoring in private environments, where RGB data is discarded and replaced by silhouettes. Our fused convolutional neural network architecture is trainable end-to-end, to estimate calorie expenditure, using temporal foreground silhouettes alongside accelerometer data. The network is trained and cross-validated on a publicly available dataset, SPHERE_RGBD + Inertial_calorie. Results show state-of-the-art minimum error on the estimation of energy expenditure (calories per minute), outperforming alternative, standard and single-modal techniques. |
Tasks | |
Published | 2018-06-21 |
URL | http://arxiv.org/abs/1806.08152v1 |
http://arxiv.org/pdf/1806.08152v1.pdf | |
PWC | https://paperswithcode.com/paper/calorinet-from-silhouettes-to-calorie |
Repo | |
Framework | |
Improved Knowledge Graph Embedding using Background Taxonomic Information
Title | Improved Knowledge Graph Embedding using Background Taxonomic Information |
Authors | Bahare Fatemi, Siamak Ravanbakhsh, David Poole |
Abstract | Knowledge graphs are used to represent relational information in terms of triples. To enable learning about domains, embedding models, such as tensor factorization models, can be used to make predictions of new triples. Often there is background taxonomic information (in terms of subclasses and subproperties) that should also be taken into account. We show that existing fully expressive (a.k.a. universal) models cannot provably respect subclass and subproperty information. We show that minimal modifications to an existing knowledge graph completion method enables injection of taxonomic information. Moreover, we prove that our model is fully expressive, assuming a lower-bound on the size of the embeddings. Experimental results on public knowledge graphs show that despite its simplicity our approach is surprisingly effective. |
Tasks | Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs |
Published | 2018-12-07 |
URL | http://arxiv.org/abs/1812.03235v1 |
http://arxiv.org/pdf/1812.03235v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-knowledge-graph-embedding-using |
Repo | |
Framework | |
Neural Transition-based Syntactic Linearization
Title | Neural Transition-based Syntactic Linearization |
Authors | Linfeng Song, Yue Zhang, Daniel Gildea |
Abstract | The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art performance. Recent work shows that a multi-layer LSTM language model outperforms competitive statistical syntactic linearization systems without using syntax. In this paper, we study neural syntactic linearization, building a transition-based syntactic linearizer leveraging a feed-forward neural network, observing significantly better results compared to LSTM language models on this task. |
Tasks | Language Modelling |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09609v1 |
http://arxiv.org/pdf/1810.09609v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-transition-based-syntactic |
Repo | |
Framework | |
Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding
Title | Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding |
Authors | Peifeng Wang, Jialong Han, Chenliang Li, Rong Pan |
Abstract | Knowledge graph embedding aims at modeling entities and relations with low-dimensional vectors. Most previous methods require that all entities should be seen during training, which is unpractical for real-world knowledge graphs with new entities emerging on a daily basis. Recent efforts on this issue suggest training a neighborhood aggregator in conjunction with the conventional entity and relation embeddings, which may help embed new entities inductively via their existing neighbors. However, their neighborhood aggregators neglect the unordered and unequal natures of an entity’s neighbors. To this end, we summarize the desired properties that may lead to effective neighborhood aggregators. We also introduce a novel aggregator, namely, Logic Attention Network (LAN), which addresses the properties by aggregating neighbors with both rules- and network-based attention weights. By comparing with conventional aggregators on two knowledge graph completion tasks, we experimentally validate LAN’s superiority in terms of the desired properties. |
Tasks | Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs |
Published | 2018-11-04 |
URL | http://arxiv.org/abs/1811.01399v1 |
http://arxiv.org/pdf/1811.01399v1.pdf | |
PWC | https://paperswithcode.com/paper/logic-attention-based-neighborhood |
Repo | |
Framework | |
A bagging and importance sampling approach to Support Vector Machines
Title | A bagging and importance sampling approach to Support Vector Machines |
Authors | R. Bárcenas, M. D. Gónzalez–Lima, A. J. Quiroz |
Abstract | An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest neighbors ideas presented in Camelo at al. (2015). As in that reference, the goal of the present proposal is to achieve a faster solution of the SVM problem without a significance loss in the prediction error. The performance of the methodology is evaluated in benchmark examples and theoretical aspects of subsample methods are discussed. |
Tasks | |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05917v1 |
http://arxiv.org/pdf/1808.05917v1.pdf | |
PWC | https://paperswithcode.com/paper/a-bagging-and-importance-sampling-approach-to |
Repo | |
Framework | |
MOHONE: Modeling Higher Order Network Effects in KnowledgeGraphs via Network Infused Embeddings
Title | MOHONE: Modeling Higher Order Network Effects in KnowledgeGraphs via Network Infused Embeddings |
Authors | Hao Yu, Vivek Kulkarni, William Wang |
Abstract | Many knowledge graph embedding methods operate on triples and are therefore implicitly limited by a very local view of the entire knowledge graph. We present a new framework MOHONE to effectively model higher order network effects in knowledge-graphs, thus enabling one to capture varying degrees of network connectivity (from the local to the global). Our framework is generic, explicitly models the network scale, and captures two different aspects of similarity in networks: (a) shared local neighborhood and (b) structural role-based similarity. First, we introduce methods that learn network representations of entities in the knowledge graph capturing these varied aspects of similarity. We then propose a fast, efficient method to incorporate the information captured by these network representations into existing knowledge graph embeddings. We show that our method consistently and significantly improves the performance on link prediction of several different knowledge-graph embedding methods including TRANSE, TRANSD, DISTMULT, and COMPLEX(by at least 4 points or 17% in some cases). |
Tasks | Graph Embedding, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction |
Published | 2018-11-01 |
URL | http://arxiv.org/abs/1811.00198v1 |
http://arxiv.org/pdf/1811.00198v1.pdf | |
PWC | https://paperswithcode.com/paper/mohone-modeling-higher-order-network-effects |
Repo | |
Framework | |
Coupling and Convergence for Hamiltonian Monte Carlo
Title | Coupling and Convergence for Hamiltonian Monte Carlo |
Authors | Nawaf Bou-Rabee, Andreas Eberle, Raphael Zimmer |
Abstract | Based on a new coupling approach, we prove that the transition step of the Hamiltonian Monte Carlo algorithm is contractive w.r.t. a carefully designed Kantorovich (L1 Wasserstein) distance. The lower bound for the contraction rate is explicit. Global convexity of the potential is not required, and thus multimodal target distributions are included. Explicit quantitative bounds for the number of steps required to approximate the stationary distribution up to a given error are a direct consequence of contractivity. These bounds show that HMC can overcome diffusive behaviour if the duration of the Hamiltonian dynamics is adjusted appropriately. |
Tasks | |
Published | 2018-05-01 |
URL | https://arxiv.org/abs/1805.00452v2 |
https://arxiv.org/pdf/1805.00452v2.pdf | |
PWC | https://paperswithcode.com/paper/coupling-and-convergence-for-hamiltonian |
Repo | |
Framework | |
Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items
Title | Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items |
Authors | Jaap Jumelet, Dieuwke Hupkes |
Abstract | In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguis- tics: (negative) polarity items. We briefly discuss the leading hypotheses about the licensing contexts that allow negative polarity items and evaluate to what extent a neural language model has the ability to correctly process a subset of such constructions. We show that the model finds a relation between the licensing context and the negative polarity item and appears to be aware of the scope of this context, which we extract from a parse tree of the sentence. With this research, we hope to pave the way for other studies linking formal linguistics to deep learning. |
Tasks | Language Modelling |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10627v1 |
http://arxiv.org/pdf/1808.10627v1.pdf | |
PWC | https://paperswithcode.com/paper/do-language-models-understand-anything-on-the |
Repo | |
Framework | |
Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon
Title | Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon |
Authors | Alexander Rakhlin, Xiyu Zhai |
Abstract | We show that minimum-norm interpolation in the Reproducing Kernel Hilbert Space corresponding to the Laplace kernel is not consistent if input dimension is constant. The lower bound holds for any choice of kernel bandwidth, even if selected based on data. The result supports the empirical observation that minimum-norm interpolation (that is, exact fit to training data) in RKHS generalizes well for some high-dimensional datasets, but not for low-dimensional ones. |
Tasks | |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.11167v1 |
http://arxiv.org/pdf/1812.11167v1.pdf | |
PWC | https://paperswithcode.com/paper/consistency-of-interpolation-with-laplace |
Repo | |
Framework | |
Deep Learning with a Rethinking Structure for Multi-label Classification
Title | Deep Learning with a Rethinking Structure for Multi-label Classification |
Authors | Yao-Yuan Yang, Yi-An Lin, Hong-Min Chu, Hsuan-Tien Lin |
Abstract | Multi-label classification (MLC) is an important class of machine learning problems that come with a wide spectrum of applications, each demanding a possibly different evaluation criterion. When solving the MLC problems, we generally expect the learning algorithm to take the hidden correlation of the labels into account to improve the prediction performance. Extracting the hidden correlation is generally a challenging task. In this work, we propose a novel deep learning framework to better extract the hidden correlation with the help of the memory structure within recurrent neural networks. The memory stores the temporary guesses on the labels and effectively allows the framework to rethink about the goodness and correlation of the guesses before making the final prediction. Furthermore, the rethinking process makes it easy to adapt to different evaluation criteria to match real-world application needs. In particular, the framework can be trained in an end-to-end style with respect to any given MLC evaluation criteria. The end-to-end design can be seamlessly combined with other deep learning techniques to conquer challenging MLC problems like image tagging. Experimental results across many real-world data sets justify that the rethinking framework indeed improves MLC performance across different evaluation criteria and leads to superior performance over state-of-the-art MLC algorithms. |
Tasks | Multi-Label Classification |
Published | 2018-02-05 |
URL | https://arxiv.org/abs/1802.01697v2 |
https://arxiv.org/pdf/1802.01697v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-a-rethinking-structure-for |
Repo | |
Framework | |
The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science
Title | The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science |
Authors | Preetum Nakkiran, Jarosław Błasiok |
Abstract | Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being influenced by answers to previous queries) a data set containing $n$ samples may support exponentially many queries in $n$. This number reduces to linearly many under naive adaptive data analysis, and even sophisticated remedies such as the Reusable Holdout (Dwork et. al 2015) only allow quadratically many queries in $n$. In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples. Such a setting may describe the search for predictive factors of some disease based on medical data, where the analyst may wish to try a number of predictive models until a satisfactory one is found. Our solution, the Generic Holdout, involves two simple ingredients: (1) a partitioning of the data into a exploration set and a holdout set and (2) a limited exposure strategy for the holdout set. An analyst is free to use the exploration set arbitrarily, but when testing hypotheses against the holdout set, the analyst only learns the answer to the question: “Is the given hypothesis true (empirically) on the holdout set?” – and no more information, such as “how well” the hypothesis fits the holdout set. The resulting scheme is immediate to analyze, but despite its simplicity we do not believe our method is obvious, as evidenced by the many violations in practice. Our proposal can be seen as an alternative to pre-registration, and allows researchers to get the benefits of adaptive data analysis without the problems of adaptivity. |
Tasks | |
Published | 2018-09-14 |
URL | http://arxiv.org/abs/1809.05596v1 |
http://arxiv.org/pdf/1809.05596v1.pdf | |
PWC | https://paperswithcode.com/paper/the-generic-holdout-preventing-false |
Repo | |
Framework | |