Paper Group ANR 325
Exascale Deep Learning to Accelerate Cancer Research. Learning Neurosymbolic Generative Models via Program Synthesis. Noisy, Greedy and Not So Greedy k-means++. Community detection in node-attributed social networks: a survey. Analysis of Bias in Gathering Information Between User Attributes in News Application. SFNet: Learning Object-aware Semanti …
Exascale Deep Learning to Accelerate Cancer Research
Title | Exascale Deep Learning to Accelerate Cancer Research |
Authors | Robert M. Patton, J. Travis Johnston, Steven R. Young, Catherine D. Schuman, Thomas E. Potok, Derek C. Rose, Seung-Hwan Lim, Junghoon Chae, Le Hou, Shahira Abousamra, Dimitris Samaras, Joel Saltz |
Abstract | Deep learning, through the use of neural networks, has demonstrated remarkable ability to automate many routine tasks when presented with sufficient data for training. The neural network architecture (e.g. number of layers, types of layers, connections between layers, etc.) plays a critical role in determining what, if anything, the neural network is able to learn from the training data. The trend for neural network architectures, especially those trained on ImageNet, has been to grow ever deeper and more complex. The result has been ever increasing accuracy on benchmark datasets with the cost of increased computational demands. In this paper we demonstrate that neural network architectures can be automatically generated, tailored for a specific application, with dual objectives: accuracy of prediction and speed of prediction. Using MENNDL–an HPC-enabled software stack for neural architecture search–we generate a neural network with comparable accuracy to state-of-the-art networks on a cancer pathology dataset that is also $16\times$ faster at inference. The speedup in inference is necessary because of the volume and velocity of cancer pathology data; specifically, the previous state-of-the-art networks are too slow for individual researchers without access to HPC systems to keep pace with the rate of data generation. Our new model enables researchers with modest computational resources to analyze newly generated data faster than it is collected. |
Tasks | Neural Architecture Search |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12291v1 |
https://arxiv.org/pdf/1909.12291v1.pdf | |
PWC | https://paperswithcode.com/paper/exascale-deep-learning-to-accelerate-cancer |
Repo | |
Framework | |
Learning Neurosymbolic Generative Models via Program Synthesis
Title | Learning Neurosymbolic Generative Models via Program Synthesis |
Authors | Halley Young, Osbert Bastani, Mayur Naik |
Abstract | Significant strides have been made toward designing better generative models in recent years. Despite this progress, however, state-of-the-art approaches are still largely unable to capture complex global structure in data. For example, images of buildings typically contain spatial patterns such as windows repeating at regular intervals; state-of-the-art generative methods can’t easily reproduce these structures. We propose to address this problem by incorporating programs representing global structure into the generative model—e.g., a 2D for-loop may represent a configuration of windows. Furthermore, we propose a framework for learning these models by leveraging program synthesis to generate training data. On both synthetic and real-world data, we demonstrate that our approach is substantially better than the state-of-the-art at both generating and completing images that contain global structure. |
Tasks | Program Synthesis |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08565v1 |
http://arxiv.org/pdf/1901.08565v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-neurosymbolic-generative-models-via |
Repo | |
Framework | |
Noisy, Greedy and Not So Greedy k-means++
Title | Noisy, Greedy and Not So Greedy k-means++ |
Authors | Anup Bhattacharya, Jan Eube, Heiko Röglin, Melanie Schmidt |
Abstract | The k-means++ algorithm due to Arthur and Vassilvitskii has become the most popular seeding method for Lloyd’s algorithm. It samples the first center uniformly at random from the data set and the other $k-1$ centers iteratively according to $D^2$-sampling where the probability that a data point becomes the next center is proportional to its squared distance to the closest center chosen so far. k-means++ is known to achieve an approximation factor of $O(\log k)$ in expectation. Already in the original paper on k-means++, Arthur and Vassilvitskii suggested a variation called greedy k-means++ algorithm in which in each iteration multiple possible centers are sampled according to $D^2$-sampling and only the one that decreases the objective the most is chosen as a center for that iteration. It is stated as an open question whether this also leads to an $O(\log k)$-approximation (or even better). We show that this is not the case by presenting a family of instances on which greedy k-means++ yields only an $\Omega(\ell\cdot \log k)$-approximation in expectation where $\ell$ is the number of possible centers that are sampled in each iteration. We also study a variation, which we call noisy k-means++ algorithm. In this variation only one center is sampled in every iteration but not exactly by $D^2$-sampling anymore. Instead in each iteration an adversary is allowed to change the probabilities arising from $D^2$-sampling individually for each point by a factor between $1-\epsilon_1$ and $1+\epsilon_2$ for parameters $\epsilon_1 \in [0,1)$ and $\epsilon_2 \ge 0$. We prove that noisy k-means++ compute an $O(\log^2 k)$-approximation in expectation. We also discuss some applications of this result. |
Tasks | |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00653v1 |
https://arxiv.org/pdf/1912.00653v1.pdf | |
PWC | https://paperswithcode.com/paper/noisy-greedy-and-not-so-greedy-k-means |
Repo | |
Framework | |
Community detection in node-attributed social networks: a survey
Title | Community detection in node-attributed social networks: a survey |
Authors | Petr Chunaev |
Abstract | Community detection is a fundamental problem in social network analysis consisting, roughly speaking, in dividing social actors (modelled as nodes in a social graph) with certain social connections (modelled as edges in the social graph) into densely knitted and highly related groups with each group well separated from the others. Classical approaches for community detection usually deal only with the structure of the network and ignore features of the nodes, although major real-world networks provide additional actors’ information such as age, gender, interests, etc., traditionally called node attributes. It is known that the attributes may clarify and enrich the knowledge about the actors and give sense to the detected communities. This has led to a relatively novel direction in community detection — constructing algorithms that use both the structure and the attributes of the network (modelled already via a node-attributed graph) to yield more informative and qualitative results. During the last decade many methods based on different ideas and techniques have appeared in this direction. Although there exist some partial overviews of them, a recent survey is a necessity as the growing number of the methods may cause uncertainty in practice. In this paper we aim at clarifying the overall situation by proposing a clear classification of the methods and providing a comprehensive survey of the available results. We not only group and analyse the corresponding methods but also focus on practical aspects, including the information which methods outperform others and which datasets and quality measures are used for evaluation. |
Tasks | Community Detection |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09816v1 |
https://arxiv.org/pdf/1912.09816v1.pdf | |
PWC | https://paperswithcode.com/paper/community-detection-in-node-attributed-social |
Repo | |
Framework | |
Analysis of Bias in Gathering Information Between User Attributes in News Application
Title | Analysis of Bias in Gathering Information Between User Attributes in News Application |
Authors | Yoshifumi Seki, Mitsuo Yoshida |
Abstract | In the process of information gathering on the web, confirmation bias is known to exist, exemplified in phenomena such as echo chambers and filter bubbles. Our purpose is to reveal how people consume news and discuss these phenomena. In web services, we are able to use action logs of a service to investigate these phenomena. However, many existing studies about these phenomena are conducted via questionnaires, and there are few studies using action logs. In this paper, we attempt to discover biases of information gathering due to differences in user demographic attributes, such as age and gender, from the behavior log of the news distribution service. First, we summarized the actions in the service for each user attribute and showed the difference of user behavior depending on the attributes. Next, the degree of correlation between the attributes was measured using the correlation coefficient, and a strong correlation was found to exist in the browsing tendency of the news articles between the attributes. Then, the bias of keywords between attributes was discovered, keywords with bias in behavior among the attributes were found using parameters of regression analysis. Since these discovered keywords are almost explainable by big news, our proposed method is effective in detecting biased keywords. |
Tasks | |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.00554v1 |
https://arxiv.org/pdf/1909.00554v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-bias-in-gathering-information |
Repo | |
Framework | |
SFNet: Learning Object-aware Semantic Correspondence
Title | SFNet: Learning Object-aware Semantic Correspondence |
Authors | Junghyup Lee, Dohyung Kim, Jean Ponce, Bumsub Ham |
Abstract | We address the problem of semantic correspondence, that is, establishing a dense flow field between images depicting different instances of the same object or scene category. We propose to use images annotated with binary foreground masks and subjected to synthetic geometric deformations to train a convolutional neural network (CNN) for this task. Using these masks as part of the supervisory signal offers a good compromise between semantic flow methods, where the amount of training data is limited by the cost of manually selecting point correspondences, and semantic alignment ones, where the regression of a single global geometric transformation between images may be sensitive to image-specific details such as background clutter. We propose a new CNN architecture, dubbed SFNet, which implements this idea. It leverages a new and differentiable version of the argmax function for end-to-end training, with a loss that combines mask and flow consistency with smoothness terms. Experimental results demonstrate the effectiveness of our approach, which significantly outperforms the state of the art on standard benchmarks. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01810v2 |
http://arxiv.org/pdf/1904.01810v2.pdf | |
PWC | https://paperswithcode.com/paper/sfnet-learning-object-aware-semantic |
Repo | |
Framework | |
Network-Based Delineation of Health Service Areas: A Comparative Analysis of Community Detection Algorithms
Title | Network-Based Delineation of Health Service Areas: A Comparative Analysis of Community Detection Algorithms |
Authors | Diego Pinheiro, Ryan Hartman, Erick Romero, Ronaldo Menezes, Martin Cadeiras |
Abstract | A Health Service Area (HSA) is a group of geographic regions served by similar health care facilities. The delineation of HSAs plays a pivotal role in the characterization of health care services available in an area, enabling a better planning and regulation of health care services. Though Dartmouth HSAs have been the standard delineation for decades, previous work has recently shown an improved HSA delineation using a network-based approach, in which HSAs are the communities extracted by the Louvain algorithm in hospital-patient discharge networks. Given the existent heterogeneity of communities extracted by different community detection algorithms, a comparative analysis of community detection algorithms for optimal HSA delineation is lacking. In this work, we compared HSA delineations produced by community detection algorithms using a large-scale dataset containing different types of hospital-patient discharges spanning a 7-year period in US. Our results replicated the heterogeneity among community detection algorithms found in previous works, the improved HSA delineation obtained by a network-based, and suggested that Infomap may be a more suitable community detection for HSA delineation since it finds a high number of HSAs with high localization index and a low network conductance. |
Tasks | Community Detection |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.08921v1 |
https://arxiv.org/pdf/1912.08921v1.pdf | |
PWC | https://paperswithcode.com/paper/network-based-delineation-of-health-service |
Repo | |
Framework | |
Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017
Title | Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017 |
Authors | Maksim Belousov, Nikola Milosevic, William Dixon, Goran Nenadic |
Abstract | Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11716v1 |
https://arxiv.org/pdf/1905.11716v1.pdf | |
PWC | https://paperswithcode.com/paper/extracting-adverse-drug-reactions-and-their |
Repo | |
Framework | |
Community Detection and Matrix Completion with Two-Sided Graph Side-Information
Title | Community Detection and Matrix Completion with Two-Sided Graph Side-Information |
Authors | Qiaosheng, Zhang, Vincent Y. F. Tan, Changho Suh |
Abstract | We consider the problem of recovering communities of users and communities of items (such as movies) based on a partially observed rating matrix as well as side-information in the form of similarity graphs of the users and items. The user-to-user and item-to-item similarity graphs are generated according to the celebrated stochastic block model (SBM). We develop lower and upper bounds on the minimum expected number of observed ratings (also known as the sample complexity) needed for this recovery task. These bounds are functions of various parameters including the quality of the graph side-information which is manifested in the intra- and inter-cluster probabilities of the SBMs. We show that these bounds match for a wide range of parameters of interest, and match up to a constant factor of two for the remaining parameter regime. Our information-theoretic results quantify the benefits of the two-sided graph side-information for recovery, and further analysis reveals that the two pieces of graph side-information produce an interesting synergistic effect under certain scenarios. This means that if one observes only one of the two graphs, then the required sample complexity worsens to the case in which none of the graphs is observed. Thus both graphs are strictly needed to reduce the sample complexity. |
Tasks | Community Detection, Matrix Completion |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.04099v1 |
https://arxiv.org/pdf/1912.04099v1.pdf | |
PWC | https://paperswithcode.com/paper/community-detection-and-matrix-completion |
Repo | |
Framework | |
Optimal Laplacian regularization for sparse spectral community detection
Title | Optimal Laplacian regularization for sparse spectral community detection |
Authors | Lorenzo Dall’Amico, Romain Couillet, Nicolas Tremblay |
Abstract | Regularization of the classical Laplacian matrices was empirically shown to improve spectral clustering in sparse networks. It was observed that small regularizations are preferable, but this point was left as a heuristic argument. In this paper we formally determine a proper regularization which is intimately related to alternative state-of-the-art spectral techniques for sparse graphs. |
Tasks | Community Detection |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01419v1 |
https://arxiv.org/pdf/1912.01419v1.pdf | |
PWC | https://paperswithcode.com/paper/optimal-laplacian-regularization-for-sparse |
Repo | |
Framework | |
A Bayesian Inference Framework for Procedural Material Parameter Estimation
Title | A Bayesian Inference Framework for Procedural Material Parameter Estimation |
Authors | Yu Guo, Milos Hasan, Lingqi Yan, Shuang Zhao |
Abstract | Procedural material models have been gaining traction in many applications thanks to their flexibility, compactness, and easy editability. In this paper, we explore the inverse rendering problem of procedural material parameter estimation from photographs using a Bayesian framework. We use \emph{summary functions} for comparing unregistered images of a material under known lighting, and we explore both hand-designed and neural summary functions. In addition to estimating the parameters by optimization, we introduce a Bayesian inference approach using Hamiltonian Monte Carlo to sample the space of plausible material parameters, providing additional insight into the structure of the solution space. To demonstrate the effectiveness of our techniques, we fit procedural models of a range of materials—wall plaster, leather, wood, anisotropic brushed metals and metallic paints—to both synthetic and real target images. |
Tasks | Bayesian Inference |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.01067v2 |
https://arxiv.org/pdf/1912.01067v2.pdf | |
PWC | https://paperswithcode.com/paper/a-bayesian-inference-framework-for-procedural |
Repo | |
Framework | |
ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly
Title | ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly |
Authors | Abishek Sankararaman, Haris Vikalo, François Baccelli |
Abstract | Background: Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes {\color{black} helps improve accuracy of reconstruction}, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. Results: We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet - a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Conclusions: Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome $5$ of tetraploid biallelic \emph{Solanum-Tuberosum} (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques. |
Tasks | Community Detection |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12285v1 |
https://arxiv.org/pdf/1911.12285v1.pdf | |
PWC | https://paperswithcode.com/paper/comhapdet-a-spatial-community-detection |
Repo | |
Framework | |
Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets
Title | Replica-exchange Nosé-Hoover dynamics for Bayesian learning on large datasets |
Authors | Rui Luo, Qiang Zhang, Yaodong Yang, Jun Wang |
Abstract | In this paper, we propose a new sampler for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas’ states, the Nos'e-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines. |
Tasks | Image Classification |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12569v3 |
https://arxiv.org/pdf/1905.12569v3.pdf | |
PWC | https://paperswithcode.com/paper/replica-exchange-nose-hoover-dynamics-for |
Repo | |
Framework | |
Locally Optimized Random Forests
Title | Locally Optimized Random Forests |
Authors | Tim Coleman, Kimberly Kaufeld, Mary Frances Dorn, Lucas Mentch |
Abstract | Standard supervised learning procedures are validated against a test set that is assumed to have come from the same distribution as the training data. However, in many problems, the test data may have come from a different distribution. We consider the case of having many labeled observations from one distribution, $P_1$, and making predictions at unlabeled points that come from $P_2$. We combine the high predictive accuracy of random forests (Breiman, 2001) with an importance sampling scheme, where the splits and predictions of the base-trees are done in a weighted manner, which we call Locally Optimized Random Forests. These weights correspond to a non-parametric estimate of the likelihood ratio between the training and test distributions. To estimate these ratios with an unlabeled test set, we make the covariate shift assumption, where the differences in distribution are only a function of the training distributions (Shimodaira, 2000.) This methodology is motivated by the problem of forecasting power outages during hurricanes. The extreme nature of the most devastating hurricanes means that typical validation set ups will overly favor less extreme storms. Our method provides a data-driven means of adapting a machine learning method to deal with extreme events. |
Tasks | |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.09967v1 |
https://arxiv.org/pdf/1908.09967v1.pdf | |
PWC | https://paperswithcode.com/paper/locally-optimized-random-forests |
Repo | |
Framework | |
A heuristic approach for lactate threshold estimation for training decision-making: An accessible and easy to use solution for recreational runners
Title | A heuristic approach for lactate threshold estimation for training decision-making: An accessible and easy to use solution for recreational runners |
Authors | U. Etxegarai, E. Portillo, J. Irazusta, L. A. Koefoed, N. Kasabov |
Abstract | In this work, a heuristic as operational tool to estimate the lactate threshold and to facilitate its integration into the training process of recreational runners is proposed. To do so, we formalize the principles for the lactate threshold estimation from empirical data and an iterative methodology that enables experience based learning. This strategy arises as a robust and adaptive approach to solve data analysis problems. We compare the results of the heuristic with the most commonly used protocol by making a first quantitative error analysis to show its reliability. Additionally, we provide a computational algorithm so that this quantitative analysis can be easily performed in other lactate threshold protocols. With this work, we have shown that a heuristic %60 of ‘endurance running speed reserve’, serves for the same purpose of the most commonly used protocol in recreational runners, but improving its operational limitations of accessibility and consistent use. |
Tasks | Decision Making |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02318v1 |
http://arxiv.org/pdf/1903.02318v1.pdf | |
PWC | https://paperswithcode.com/paper/a-heuristic-approach-for-lactate-threshold |
Repo | |
Framework | |