January 30, 2020

3325 words 16 mins read

Paper Group ANR 325

Exascale Deep Learning to Accelerate Cancer Research. Learning Neurosymbolic Generative Models via Program Synthesis. Noisy, Greedy and Not So Greedy k-means++. Community detection in node-attributed social networks: a survey. Analysis of Bias in Gathering Information Between User Attributes in News Application. SFNet: Learning Object-aware Semanti …

Exascale Deep Learning to Accelerate Cancer Research


Title	Exascale Deep Learning to Accelerate Cancer Research
Authors	Robert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Junghoon Chae, Le Hou, Shahira Abousamra, Dimitris Samaras, Joel Saltz
Abstract	Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL–an HPC-enabled software stack for neural architecture search–we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected.
Tasks	Neural Architecture Search
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12291v1
PDF	https://arxiv.org/pdf/1909.12291v1.pdf
PWC	https://paperswithcode.com/paper/exascale-deep-learning-to-accelerate-cancer
Repo
Framework

Learning Neurosymbolic Generative Models via Program Synthesis


Title	Learning Neurosymbolic Generative Models via Program Synthesis
Authors	Halley Young, Osbert Bastani, Mayur Naik
Abstract	Significant strides have been made toward designing better generative models in recent years. Despite this progress, however, state-of-the-art approaches are still largely unable to capture complex global structure in data. For example, images of buildings typically contain spatial patterns such as windows repeating at regular intervals; state-of-the-art generative methods can’t easily reproduce these structures. We propose to address this problem by incorporating programs representing global structure into the generative model—e.g., a 2D for-loop may represent a configuration of windows. Furthermore, we propose a framework for learning these models by leveraging program synthesis to generate training data. On both synthetic and real-world data, we demonstrate that our approach is substantially better than the state-of-the-art at both generating and completing images that contain global structure.
Tasks	Program Synthesis
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08565v1
PDF	http://arxiv.org/pdf/1901.08565v1.pdf
PWC	https://paperswithcode.com/paper/learning-neurosymbolic-generative-models-via
Repo
Framework

Noisy, Greedy and Not So Greedy k-means++


Title	Noisy, Greedy and Not So Greedy k-means++
Authors	Anup Bhattacharya, Jan Eube, Heiko Röglin, Melanie Schmidt
Abstract	The k-means++ algorithm due to Arthur and Vassilvitskii has become the most popular seeding method for Lloyd’s algorithm. It samples the first center uniformly at random from the data set and the other $k-1$ centers iteratively according to $D^2$-sampling where the probability that a data point becomes the next center is proportional to its squared distance to the closest center chosen so far. k-means++ is known to achieve an approximation factor of $O(\log k)$ in expectation. Already in the original paper on k-means++, Arthur and Vassilvitskii suggested a variation called greedy k-means++ algorithm in which in each iteration multiple possible centers are sampled according to $D^2$-sampling and only the one that decreases the objective the most is chosen as a center for that iteration. It is stated as an open question whether this also leads to an $O(\log k)$-approximation (or even better). We show that this is not the case by presenting a family of instances on which greedy k-means++ yields only an $\Omega(\ell\cdot \log k)$-approximation in expectation where $\ell$ is the number of possible centers that are sampled in each iteration. We also study a variation, which we call noisy k-means++ algorithm. In this variation only one center is sampled in every iteration but not exactly by $D^2$-sampling anymore. Instead in each iteration an adversary is allowed to change the probabilities arising from $D^2$-sampling individually for each point by a factor between $1-\epsilon_1$ and $1+\epsilon_2$ for parameters $\epsilon_1 \in [0,1)$ and $\epsilon_2 \ge 0$. We prove that noisy k-means++ compute an $O(\log^2 k)$-approximation in expectation. We also discuss some applications of this result.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00653v1
PDF	https://arxiv.org/pdf/1912.00653v1.pdf
PWC	https://paperswithcode.com/paper/noisy-greedy-and-not-so-greedy-k-means
Repo
Framework


Title	Community detection in node-attributed social networks: a survey
Authors	Petr Chunaev
Abstract	Community detection is a fundamental problem in social network analysis consisting, roughly speaking, in dividing social actors (modelled as nodes in a social graph) with certain social connections (modelled as edges in the social graph) into densely knitted and highly related groups with each group well separated from the others. Classical approaches for community detection usually deal only with the structure of the network and ignore features of the nodes, although major real-world networks provide additional actors’ information such as age, gender, interests, etc., traditionally called node attributes. It is known that the attributes may clarify and enrich the knowledge about the actors and give sense to the detected communities. This has led to a relatively novel direction in community detection — constructing algorithms that use both the structure and the attributes of the network (modelled already via a node-attributed graph) to yield more informative and qualitative results. During the last decade many methods based on different ideas and techniques have appeared in this direction. Although there exist some partial overviews of them, a recent survey is a necessity as the growing number of the methods may cause uncertainty in practice. In this paper we aim at clarifying the overall situation by proposing a clear classification of the methods and providing a comprehensive survey of the available results. We not only group and analyse the corresponding methods but also focus on practical aspects, including the information which methods outperform others and which datasets and quality measures are used for evaluation.
Tasks	Community Detection
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09816v1
PDF	https://arxiv.org/pdf/1912.09816v1.pdf
PWC	https://paperswithcode.com/paper/community-detection-in-node-attributed-social
Repo
Framework

Analysis of Bias in Gathering Information Between User Attributes in News Application


Title	Analysis of Bias in Gathering Information Between User Attributes in News Application
Authors	Yoshifumi Seki, Mitsuo Yoshida
Abstract	In the process of information gathering on the web, confirmation bias is known to exist, exemplified in phenomena such as echo chambers and filter bubbles. Our purpose is to reveal how people consume news and discuss these phenomena. In web services, we are able to use action logs of a service to investigate these phenomena. However, many existing studies about these phenomena are conducted via questionnaires, and there are few studies using action logs. In this paper, we attempt to discover biases of information gathering due to differences in user demographic attributes, such as age and gender, from the behavior log of the news distribution service. First, we summarized the actions in the service for each user attribute and showed the difference of user behavior depending on the attributes. Next, the degree of correlation between the attributes was measured using the correlation coefficient, and a strong correlation was found to exist in the browsing tendency of the news articles between the attributes. Then, the bias of keywords between attributes was discovered, keywords with bias in behavior among the attributes were found using parameters of regression analysis. Since these discovered keywords are almost explainable by big news, our proposed method is effective in detecting biased keywords.
Tasks
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00554v1
PDF	https://arxiv.org/pdf/1909.00554v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-bias-in-gathering-information
Repo
Framework

SFNet: Learning Object-aware Semantic Correspondence


Title	SFNet: Learning Object-aware Semantic Correspondence
Authors	Junghyup Lee, Dohyung Kim, Jean Ponce, Bumsub Ham
Abstract	We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks and subjected to synthetic geometric deformations to train a convolutional neural network (CNN) for this task. Using these masks as part of the supervisory signal offers a good compromise between semantic flow methods, where the amount of training data is limited by the cost of manually selecting point correspondences, and semantic alignment ones, where the regression of a single global geometric transformation between images may be sensitive to image-specific details such as background clutter. We propose a new CNN architecture, dubbed SFNet, which implements this idea. It leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms. Experimental results demonstrate the effectiveness of our approach, which significantly outperforms the state of the art on standard benchmarks.
Tasks
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01810v2
PDF	http://arxiv.org/pdf/1904.01810v2.pdf
PWC	https://paperswithcode.com/paper/sfnet-learning-object-aware-semantic
Repo
Framework

Network-Based Delineation of Health Service Areas: A Comparative Analysis of Community Detection Algorithms


Title	Network-Based Delineation of Health Service Areas: A Comparative Analysis of Community Detection Algorithms
Authors	Diego Pinheiro, Ryan Hartman, Erick Romero, Ronaldo Menezes, Martin Cadeiras
Abstract	A Health Service Area (HSA) is a group of geographic regions served by similar health care facilities. The delineation of HSAs plays a pivotal role in the characterization of health care services available in an area, enabling a better planning and regulation of health care services. Though Dartmouth HSAs have been the standard delineation for decades, previous work has recently shown an improved HSA delineation using a network-based approach, in which HSAs are the communities extracted by the Louvain algorithm in hospital-patient discharge networks. Given the existent heterogeneity of communities extracted by different community detection algorithms, a comparative analysis of community detection algorithms for optimal HSA delineation is lacking. In this work, we compared HSA delineations produced by community detection algorithms using a large-scale dataset containing different types of hospital-patient discharges spanning a 7-year period in US. Our results replicated the heterogeneity among community detection algorithms found in previous works, the improved HSA delineation obtained by a network-based, and suggested that Infomap may be a more suitable community detection for HSA delineation since it finds a high number of HSAs with high localization index and a low network conductance.
Tasks	Community Detection
Published	2019-12-08
URL	https://arxiv.org/abs/1912.08921v1
PDF	https://arxiv.org/pdf/1912.08921v1.pdf
PWC	https://paperswithcode.com/paper/network-based-delineation-of-health-service
Repo
Framework

Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017


Title	Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017
Authors	Maksim Belousov, Nikola Milosevic, William Dixon, Goran Nenadic
Abstract	Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11716v1
PDF	https://arxiv.org/pdf/1905.11716v1.pdf
PWC	https://paperswithcode.com/paper/extracting-adverse-drug-reactions-and-their
Repo
Framework

Community Detection and Matrix Completion with Two-Sided Graph Side-Information


Title	Community Detection and Matrix Completion with Two-Sided Graph Side-Information
Authors	Qiaosheng, Zhang, Vincent Y. F. Tan, Changho Suh
Abstract	We consider the problem of recovering communities of users and communities of items (such as movies) based on a partially observed rating matrix as well as side-information in the form of similarity graphs of the users and items. The user-to-user and item-to-item similarity graphs are generated according to the celebrated stochastic block model (SBM). We develop lower and upper bounds on the minimum expected number of observed ratings (also known as the sample complexity) needed for this recovery task. These bounds are functions of various parameters including the quality of the graph side-information which is manifested in the intra- and inter-cluster probabilities of the SBMs. We show that these bounds match for a wide range of parameters of interest, and match up to a constant factor of two for the remaining parameter regime. Our information-theoretic results quantify the benefits of the two-sided graph side-information for recovery, and further analysis reveals that the two pieces of graph side-information produce an interesting synergistic effect under certain scenarios. This means that if one observes only one of the two graphs, then the required sample complexity worsens to the case in which none of the graphs is observed. Thus both graphs are strictly needed to reduce the sample complexity.
Tasks	Community Detection, Matrix Completion
Published	2019-12-06
URL	https://arxiv.org/abs/1912.04099v1
PDF	https://arxiv.org/pdf/1912.04099v1.pdf
PWC	https://paperswithcode.com/paper/community-detection-and-matrix-completion
Repo
Framework

Optimal Laplacian regularization for sparse spectral community detection


Title	Optimal Laplacian regularization for sparse spectral community detection
Authors	Lorenzo Dall’Amico, Romain Couillet, Nicolas Tremblay
Abstract	Regularization of the classical Laplacian matrices was empirically shown to improve spectral clustering in sparse networks. It was observed that small regularizations are preferable, but this point was left as a heuristic argument. In this paper we formally determine a proper regularization which is intimately related to alternative state-of-the-art spectral techniques for sparse graphs.
Tasks	Community Detection
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01419v1
PDF	https://arxiv.org/pdf/1912.01419v1.pdf
PWC	https://paperswithcode.com/paper/optimal-laplacian-regularization-for-sparse
Repo
Framework

A Bayesian Inference Framework for Procedural Material Parameter Estimation


Title	A Bayesian Inference Framework for Procedural Material Parameter Estimation
Authors	Yu Guo, Milos Hasan, Lingqi Yan, Shuang Zhao
Abstract	Procedural material models have been gaining traction in many applications thanks to their flexibility, compactness, and easy editability. In this paper, we explore the inverse rendering problem of procedural material parameter estimation from photographs using a Bayesian framework. We use \emph{summary functions} for comparing unregistered images of a material under known lighting, and we explore both hand-designed and neural summary functions. In addition to estimating the parameters by optimization, we introduce a Bayesian inference approach using Hamiltonian Monte Carlo to sample the space of plausible material parameters, providing additional insight into the structure of the solution space. To demonstrate the effectiveness of our techniques, we fit procedural models of a range of materials—wall plaster, leather, wood, anisotropic brushed metals and metallic paints—to both synthetic and real target images.
Tasks	Bayesian Inference
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01067v2
PDF	https://arxiv.org/pdf/1912.01067v2.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-inference-framework-for-procedural
Repo
Framework

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly


Title	ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly
Authors	Abishek Sankararaman, Haris Vikalo, François Baccelli
Abstract	Background: Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes {\color{black} helps improve accuracy of reconstruction}, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. Results: We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet - a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Conclusions: Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome $5$ of tetraploid biallelic \emph{Solanum-Tuberosum} (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.
Tasks	Community Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12285v1
PDF	https://arxiv.org/pdf/1911.12285v1.pdf
PWC	https://paperswithcode.com/paper/comhapdet-a-spatial-community-detection
Repo
Framework

Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets


Title	Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets
Authors	Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang
Abstract	In this paper, we propose a new sampler for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas’ states, the Nos'e-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.
Tasks	Image Classification
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12569v3
PDF	https://arxiv.org/pdf/1905.12569v3.pdf
PWC	https://paperswithcode.com/paper/replica-exchange-nose-hoover-dynamics-for
Repo
Framework

Locally Optimized Random Forests


Title	Locally Optimized Random Forests
Authors	Tim Coleman, Kimberly Kaufeld, Mary Frances Dorn, Lucas Mentch
Abstract	Standard supervised learning procedures are validated against a test set that is assumed to have come from the same distribution as the training data. However, in many problems, the test data may have come from a different distribution. We consider the case of having many labeled observations from one distribution, $P_1$, and making predictions at unlabeled points that come from $P_2$. We combine the high predictive accuracy of random forests (Breiman, 2001) with an importance sampling scheme, where the splits and predictions of the base-trees are done in a weighted manner, which we call Locally Optimized Random Forests. These weights correspond to a non-parametric estimate of the likelihood ratio between the training and test distributions. To estimate these ratios with an unlabeled test set, we make the covariate shift assumption, where the differences in distribution are only a function of the training distributions (Shimodaira, 2000.) This methodology is motivated by the problem of forecasting power outages during hurricanes. The extreme nature of the most devastating hurricanes means that typical validation set ups will overly favor less extreme storms. Our method provides a data-driven means of adapting a machine learning method to deal with extreme events.
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.09967v1
PDF	https://arxiv.org/pdf/1908.09967v1.pdf
PWC	https://paperswithcode.com/paper/locally-optimized-random-forests
Repo
Framework

A heuristic approach for lactate threshold estimation for training decision-making: An accessible and easy to use solution for recreational runners


Title	A heuristic approach for lactate threshold estimation for training decision-making: An accessible and easy to use solution for recreational runners
Authors	U. Etxegarai, E. Portillo, J. Irazusta, L. A. Koefoed, N. Kasabov
Abstract	In this work, a heuristic as operational tool to estimate the lactate threshold and to facilitate its integration into the training process of recreational runners is proposed. To do so, we formalize the principles for the lactate threshold estimation from empirical data and an iterative methodology that enables experience based learning. This strategy arises as a robust and adaptive approach to solve data analysis problems. We compare the results of the heuristic with the most commonly used protocol by making a first quantitative error analysis to show its reliability. Additionally, we provide a computational algorithm so that this quantitative analysis can be easily performed in other lactate threshold protocols. With this work, we have shown that a heuristic %60 of ‘endurance running speed reserve’, serves for the same purpose of the most commonly used protocol in recreational runners, but improving its operational limitations of accessibility and consistent use.
Tasks	Decision Making
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02318v1
PDF	http://arxiv.org/pdf/1903.02318v1.pdf
PWC	https://paperswithcode.com/paper/a-heuristic-approach-for-lactate-threshold
Repo
Framework