October 19, 2019

2687 words 13 mins read

Paper Group ANR 321

Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning. SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks. Data-driven Localization and Estimation of Disturbance in the Interconnected Power System. Adaptivity for Regularized Kernel Methods by Lepskii’s Principle. CaloriNe …

Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning


Title	Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning
Authors	Shunsuke Kitada, Hitoshi Iyatomi
Abstract	In this report, we introduce the outline of our system in Task 3: Disease Classification of ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection. We fine-tuned multiple pre-trained neural network models based on Squeeze-and-Excitation Networks (SENet) which achieved state-of-the-art results in the field of image recognition. In addition, we used the mean teachers as a semi-supervised learning framework and introduced some specially designed data augmentation strategies for skin lesion analysis. We confirmed our data augmentation strategy improved classification performance and demonstrated 87.2% in balanced accuracy on the official ISIC2018 validation dataset.
Tasks	Data Augmentation, Skin Lesion Classification
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02568v1
PDF	http://arxiv.org/pdf/1809.02568v1.pdf
PWC	https://paperswithcode.com/paper/skin-lesion-classification-with-ensemble-of
Repo
Framework

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks


Title	SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
Authors	Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska
Abstract	Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has $10^4$ basic network layers on a 12GB K40c.
Tasks
Published	2018-01-13
URL	http://arxiv.org/abs/1801.04380v1
PDF	http://arxiv.org/pdf/1801.04380v1.pdf
PWC	https://paperswithcode.com/paper/superneurons-dynamic-gpu-memory-management
Repo
Framework

Data-driven Localization and Estimation of Disturbance in the Interconnected Power System


Title	Data-driven Localization and Estimation of Disturbance in the Interconnected Power System
Authors	Hyang-Won Lee, Jianan Zhang, Eytan Modiano
Abstract	Identifying the location of a disturbance and its magnitude is an important component for stable operation of power systems. We study the problem of localizing and estimating a disturbance in the interconnected power system. We take a model-free approach to this problem by using frequency data from generators. Specifically, we develop a logistic regression based method for localization and a linear regression based method for estimation of the magnitude of disturbance. Our model-free approach does not require the knowledge of system parameters such as inertia constants and topology, and is shown to achieve highly accurate localization and estimation performance even in the presence of measurement noise and missing data.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01318v1
PDF	http://arxiv.org/pdf/1806.01318v1.pdf
PWC	https://paperswithcode.com/paper/data-driven-localization-and-estimation-of
Repo
Framework

Adaptivity for Regularized Kernel Methods by Lepskii’s Principle


Title	Adaptivity for Regularized Kernel Methods by Lepskii’s Principle
Authors	Nicole Mücke
Abstract	We address the problem of {\it adaptivity} in the framework of reproducing kernel Hilbert space (RKHS) regression. More precisely, we analyze estimators arising from a linear regularization scheme $g_\lam$. In practical applications, an important task is to choose the regularization parameter $\lam$ appropriately, i.e. based only on the given data and independently on unknown structural assumptions on the regression function. An attractive approach avoiding data-splitting is the {\it Lepskii Principle} (LP), also known as the {\it Balancing Principle} is this setting. We show that a modified parameter choice based on (LP) is minimax optimal adaptive, up to $\log\log(n)$. A convenient result is the fact that balancing in $L^2(\nu)-$ norm, which is easiest, automatically gives optimal balancing in all stronger norms, interpolating between $L^2(\nu)$ and the RKHS. An analogous result is open for other classical approaches to data dependent choices of the regularization parameter, e.g. for Hold-Out.
Tasks
Published	2018-04-15
URL	http://arxiv.org/abs/1804.05433v1
PDF	http://arxiv.org/pdf/1804.05433v1.pdf
PWC	https://paperswithcode.com/paper/adaptivity-for-regularized-kernel-methods-by
Repo
Framework

CaloriNet: From silhouettes to calorie estimation in private environments


Title	CaloriNet: From silhouettes to calorie estimation in private environments
Authors	Alessandro Masullo, Tilo Burghardt, Dima Damen, Sion Hannuna, Victor Ponce-López, Majid Mirmehdi
Abstract	We propose a novel deep fusion architecture, CaloriNet, for the online estimation of energy expenditure for free living monitoring in private environments, where RGB data is discarded and replaced by silhouettes. Our fused convolutional neural network architecture is trainable end-to-end, to estimate calorie expenditure, using temporal foreground silhouettes alongside accelerometer data. The network is trained and cross-validated on a publicly available dataset, SPHERE_RGBD + Inertial_calorie. Results show state-of-the-art minimum error on the estimation of energy expenditure (calories per minute), outperforming alternative, standard and single-modal techniques.
Tasks
Published	2018-06-21
URL	http://arxiv.org/abs/1806.08152v1
PDF	http://arxiv.org/pdf/1806.08152v1.pdf
PWC	https://paperswithcode.com/paper/calorinet-from-silhouettes-to-calorie
Repo
Framework

Improved Knowledge Graph Embedding using Background Taxonomic Information


Title	Improved Knowledge Graph Embedding using Background Taxonomic Information
Authors	Bahare Fatemi, Siamak Ravanbakhsh, David Poole
Abstract	Knowledge graphs are used to represent relational information in terms of triples. To enable learning about domains, embedding models, such as tensor factorization models, can be used to make predictions of new triples. Often there is background taxonomic information (in terms of subclasses and subproperties) that should also be taken into account. We show that existing fully expressive (a.k.a. universal) models cannot provably respect subclass and subproperty information. We show that minimal modifications to an existing knowledge graph completion method enables injection of taxonomic information. Moreover, we prove that our model is fully expressive, assuming a lower-bound on the size of the embeddings. Experimental results on public knowledge graphs show that despite its simplicity our approach is surprisingly effective.
Tasks	Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs
Published	2018-12-07
URL	http://arxiv.org/abs/1812.03235v1
PDF	http://arxiv.org/pdf/1812.03235v1.pdf
PWC	https://paperswithcode.com/paper/improved-knowledge-graph-embedding-using
Repo
Framework

Neural Transition-based Syntactic Linearization


Title	Neural Transition-based Syntactic Linearization
Authors	Linfeng Song, Yue Zhang, Daniel Gildea
Abstract	The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art performance. Recent work shows that a multi-layer LSTM language model outperforms competitive statistical syntactic linearization systems without using syntax. In this paper, we study neural syntactic linearization, building a transition-based syntactic linearizer leveraging a feed-forward neural network, observing significantly better results compared to LSTM language models on this task.
Tasks	Language Modelling
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09609v1
PDF	http://arxiv.org/pdf/1810.09609v1.pdf
PWC	https://paperswithcode.com/paper/neural-transition-based-syntactic
Repo
Framework

Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding


Title	Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding
Authors	Peifeng Wang, Jialong Han, Chenliang Li, Rong Pan
Abstract	Knowledge graph embedding aims at modeling entities and relations with low-dimensional vectors. Most previous methods require that all entities should be seen during training, which is unpractical for real-world knowledge graphs with new entities emerging on a daily basis. Recent efforts on this issue suggest training a neighborhood aggregator in conjunction with the conventional entity and relation embeddings, which may help embed new entities inductively via their existing neighbors. However, their neighborhood aggregators neglect the unordered and unequal natures of an entity’s neighbors. To this end, we summarize the desired properties that may lead to effective neighborhood aggregators. We also introduce a novel aggregator, namely, Logic Attention Network (LAN), which addresses the properties by aggregating neighbors with both rules- and network-based attention weights. By comparing with conventional aggregators on two knowledge graph completion tasks, we experimentally validate LAN’s superiority in terms of the desired properties.
Tasks	Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs
Published	2018-11-04
URL	http://arxiv.org/abs/1811.01399v1
PDF	http://arxiv.org/pdf/1811.01399v1.pdf
PWC	https://paperswithcode.com/paper/logic-attention-based-neighborhood
Repo
Framework

A bagging and importance sampling approach to Support Vector Machines


Title	A bagging and importance sampling approach to Support Vector Machines
Authors	R. Bárcenas, M. D. Gónzalez–Lima, A. J. Quiroz
Abstract	An importance sampling and bagging approach to solving the support vector machine (SVM) problem in the context of large databases is presented and evaluated. Our algorithm builds on the nearest neighbors ideas presented in Camelo at al. (2015). As in that reference, the goal of the present proposal is to achieve a faster solution of the SVM problem without a significance loss in the prediction error. The performance of the methodology is evaluated in benchmark examples and theoretical aspects of subsample methods are discussed.
Tasks
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05917v1
PDF	http://arxiv.org/pdf/1808.05917v1.pdf
PWC	https://paperswithcode.com/paper/a-bagging-and-importance-sampling-approach-to
Repo
Framework

MOHONE: Modeling Higher Order Network Effects in KnowledgeGraphs via Network Infused Embeddings


Title	MOHONE: Modeling Higher Order Network Effects in KnowledgeGraphs via Network Infused Embeddings
Authors	Hao Yu, Vivek Kulkarni, William Wang
Abstract	Many knowledge graph embedding methods operate on triples and are therefore implicitly limited by a very local view of the entire knowledge graph. We present a new framework MOHONE to effectively model higher order network effects in knowledge-graphs, thus enabling one to capture varying degrees of network connectivity (from the local to the global). Our framework is generic, explicitly models the network scale, and captures two different aspects of similarity in networks: (a) shared local neighborhood and (b) structural role-based similarity. First, we introduce methods that learn network representations of entities in the knowledge graph capturing these varied aspects of similarity. We then propose a fast, efficient method to incorporate the information captured by these network representations into existing knowledge graph embeddings. We show that our method consistently and significantly improves the performance on link prediction of several different knowledge-graph embedding methods including TRANSE, TRANSD, DISTMULT, and COMPLEX(by at least 4 points or 17% in some cases).
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00198v1
PDF	http://arxiv.org/pdf/1811.00198v1.pdf
PWC	https://paperswithcode.com/paper/mohone-modeling-higher-order-network-effects
Repo
Framework

Coupling and Convergence for Hamiltonian Monte Carlo


Title	Coupling and Convergence for Hamiltonian Monte Carlo
Authors	Nawaf Bou-Rabee, Andreas Eberle, Raphael Zimmer
Abstract	Based on a new coupling approach, we prove that the transition step of the Hamiltonian Monte Carlo algorithm is contractive w.r.t. a carefully designed Kantorovich (L1 Wasserstein) distance. The lower bound for the contraction rate is explicit. Global convexity of the potential is not required, and thus multimodal target distributions are included. Explicit quantitative bounds for the number of steps required to approximate the stationary distribution up to a given error are a direct consequence of contractivity. These bounds show that HMC can overcome diffusive behaviour if the duration of the Hamiltonian dynamics is adjusted appropriately.
Tasks
Published	2018-05-01
URL	https://arxiv.org/abs/1805.00452v2
PDF	https://arxiv.org/pdf/1805.00452v2.pdf
PWC	https://paperswithcode.com/paper/coupling-and-convergence-for-hamiltonian
Repo
Framework

Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items


Title	Do Language Models Understand Anything? On the Ability of LSTMs to Understand Negative Polarity Items
Authors	Jaap Jumelet, Dieuwke Hupkes
Abstract	In this paper, we attempt to link the inner workings of a neural language model to linguistic theory, focusing on a complex phenomenon well discussed in formal linguis- tics: (negative) polarity items. We briefly discuss the leading hypotheses about the licensing contexts that allow negative polarity items and evaluate to what extent a neural language model has the ability to correctly process a subset of such constructions. We show that the model finds a relation between the licensing context and the negative polarity item and appears to be aware of the scope of this context, which we extract from a parse tree of the sentence. With this research, we hope to pave the way for other studies linking formal linguistics to deep learning.
Tasks	Language Modelling
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10627v1
PDF	http://arxiv.org/pdf/1808.10627v1.pdf
PWC	https://paperswithcode.com/paper/do-language-models-understand-anything-on-the
Repo
Framework

Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon


Title	Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon
Authors	Alexander Rakhlin, Xiyu Zhai
Abstract	We show that minimum-norm interpolation in the Reproducing Kernel Hilbert Space corresponding to the Laplace kernel is not consistent if input dimension is constant. The lower bound holds for any choice of kernel bandwidth, even if selected based on data. The result supports the empirical observation that minimum-norm interpolation (that is, exact fit to training data) in RKHS generalizes well for some high-dimensional datasets, but not for low-dimensional ones.
Tasks
Published	2018-12-28
URL	http://arxiv.org/abs/1812.11167v1
PDF	http://arxiv.org/pdf/1812.11167v1.pdf
PWC	https://paperswithcode.com/paper/consistency-of-interpolation-with-laplace
Repo
Framework

Deep Learning with a Rethinking Structure for Multi-label Classification


Title	Deep Learning with a Rethinking Structure for Multi-label Classification
Authors	Yao-Yuan Yang, Yi-An Lin, Hong-Min Chu, Hsuan-Tien Lin
Abstract	Multi-label classification (MLC) is an important class of machine learning problems that come with a wide spectrum of applications, each demanding a possibly different evaluation criterion. When solving the MLC problems, we generally expect the learning algorithm to take the hidden correlation of the labels into account to improve the prediction performance. Extracting the hidden correlation is generally a challenging task. In this work, we propose a novel deep learning framework to better extract the hidden correlation with the help of the memory structure within recurrent neural networks. The memory stores the temporary guesses on the labels and effectively allows the framework to rethink about the goodness and correlation of the guesses before making the final prediction. Furthermore, the rethinking process makes it easy to adapt to different evaluation criteria to match real-world application needs. In particular, the framework can be trained in an end-to-end style with respect to any given MLC evaluation criteria. The end-to-end design can be seamlessly combined with other deep learning techniques to conquer challenging MLC problems like image tagging. Experimental results across many real-world data sets justify that the rethinking framework indeed improves MLC performance across different evaluation criteria and leads to superior performance over state-of-the-art MLC algorithms.
Tasks	Multi-Label Classification
Published	2018-02-05
URL	https://arxiv.org/abs/1802.01697v2
PDF	https://arxiv.org/pdf/1802.01697v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-a-rethinking-structure-for
Repo
Framework

The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science


Title	The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science
Authors	Preetum Nakkiran, Jarosław Błasiok
Abstract	Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being influenced by answers to previous queries) a data set containing $n$ samples may support exponentially many queries in $n$. This number reduces to linearly many under naive adaptive data analysis, and even sophisticated remedies such as the Reusable Holdout (Dwork et. al 2015) only allow quadratically many queries in $n$. In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples. Such a setting may describe the search for predictive factors of some disease based on medical data, where the analyst may wish to try a number of predictive models until a satisfactory one is found. Our solution, the Generic Holdout, involves two simple ingredients: (1) a partitioning of the data into a exploration set and a holdout set and (2) a limited exposure strategy for the holdout set. An analyst is free to use the exploration set arbitrarily, but when testing hypotheses against the holdout set, the analyst only learns the answer to the question: “Is the given hypothesis true (empirically) on the holdout set?” – and no more information, such as “how well” the hypothesis fits the holdout set. The resulting scheme is immediate to analyze, but despite its simplicity we do not believe our method is obvious, as evidenced by the many violations in practice. Our proposal can be seen as an alternative to pre-registration, and allows researchers to get the benefits of adaptive data analysis without the problems of adaptivity.
Tasks
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05596v1
PDF	http://arxiv.org/pdf/1809.05596v1.pdf
PWC	https://paperswithcode.com/paper/the-generic-holdout-preventing-false
Repo
Framework