PINSPlus: A tool for tumor subtype discovery in integrated genomic data
Authors: Hung Nguyen, Sangam Shrestha, Tin Nguyen*
Emails: hungnp(at)nevada.unr.edu, sangam(at)nevada.unr.edu, tinn(at)unr.edu
Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557
Introduction
After decades of screening, the chance of a person being diagnosed with prostate or breast cancer has nearly doubled. However, the number of patients with advanced disease has only been marginally reduced, suggesting that current methods of screening result in either false positives or over-diagnosis. At the same time, 30-55% of patients with non-small cell lung cancer develop recurrence and die after curative resection, suggesting that a subset of patients would have benefited from more aggressive treatments at early stages. Although adjuvant and neoadjuvant chemotherapy have been shown to significantly improve the survival of patients with advanced early-stage disease, they are not usually recommended as the initial course of treatment. The ability to accurately diagnose patients would allow for better patient prognoses.
Methodology
The method is based on the observation that small changes in quantitative assays will be inherently present between individuals, even in a homogeneous population. Therefore, if distinct molecular subtypes do exist, they must be stable with respect to small changes in quantitative assays. In order to discover reliable subtypes from molecular data, we estimate how often each pair of patients is grouped together in the following scenarios: i) when the data are perturbed (by adding Gaussian noise), ii) when using different data types, and iii) when using different clustering techniques. We then partition patients into subgroups that are strongly connected in all scenarios. The workflow of PINSPlus is described in the figure above.
Installation
Install PINSPlus
The latest version of PINSPlus package can be installed from CRAN repository using the command below:
> install.packages("PINSPlus")
> library(PINSPlus)
Use the command below to install PINSPlus version 1.0.2 that we used in our comparison
> install.packages(c("entropy", "pbmcapply", "doParallel", "foreach")) #install dependencies
> install.packages("https://cran.r-project.org/src/contrib/Archive/PINSPlus/PINSPlus_1.0.2.tar.gz", repos = NULL)
The reference manual and vignettes can be found here.
Install ConsensusClusterPlus
ConsensusClusterPlus can be installed from Bioconductor repository following the instruction at https://bioconductor.org/packages/release/bioc/html/ConsensusClusterPlus.html. We used ConsensusClusterPlus version 1.46.0 in our comparison.
Install SNFtool
SNFtool can be installed from CRAN repository at https://cran.r-project.org/web/packages/SNFtool/index.html. We used SNFtool version 2.3.0 in our comparison.
Install iClusterPlus
iClusterPlus can be installed from Bioconductor repository following the instruction at https://bioconductor.org/packages/release/bioc/html/iClusterPlus.html. We used iClusterPlus version 1.18.0 in our comparison.
Datasets
Gene expression dataThe processed datasets from Gene Expression Omnibus (GSE10245, GSE19188, GSE43580, GSE15061, and GSE14924) and Broad Institute (Lung2001, AML2004, and Brain2002).
The Cancer Genome Atlas data
The processed datasets are from The Cancer Genome Atlas datasets (TCGA) website (https://cancergenome.nih.gov) and Firebrowse website (http://firebrowse.org/). The datasets include Kidney renal clear cell carcinoma (KIRC), Glioblastoma multiforme (GBM), Acute Myeloid Leukemia (LAML), Lung squamous cell carcinoma (LUSC), Bladder Urothelial Carcinoma (BLCA), Head and Neck squamous cell carcinoma (HNSC), Liver hepatocellular carcinoma (LIHC), Stomach adenocarcinoma (STAD), Thymoma (THYM), Glioma (GBMLGG), Brain Lower Grade Glioma (LGG), Pancreatic adenocarcinoma (PAAD), Skin Cutaneous Melanoma (SKCM), Colorectal adenocarcinoma (COADREAD), Uterine Corpus Endometrial Carcinoma (UCEC), Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), Colon adenocarcinoma (COAD), Breast invasive carcinoma (BRCA), Stomach and Esophageal carcinoma (STES), Kidney renal papillary cell carcinoma (KIRP), Kidney Chromophobe (KICH), Uveal Melanoma (UVM), Adrenocortical carcinoma (ACC), Sarcoma (SARC), Mesothelioma (MESO), Rectum adenocarcinoma (READ), Uterine Carcinosarcoma (UCS), Ovarian serous cystadenocarcinoma (OV), Esophageal carcinoma (ESCA), Paraganglioma (PCPG), Lung adenocarcinoma (LUAD), Prostate adenocarcinoma (PRAD), Thyroid carcinoma (THCA) and Testicular Germ Cell Tumors (TGCT).
The Molecular Taxonomy of Breast Cancer International Consortium datasets
The processed datasets from European Genome-Phenome Archive (https://www.ebi.ac.uk/ega/) and cBioPortal (http://www.cbioportal.org).
Data analysis with PINSPlus
To run PINSPlus with provided datasets, we provided scripts that reduce the processing steps which can be download here:
Please follow instructions in scripts/README.txt
to run the provided scripts.
Citation
If you find our software useful to your work, please cite our software using the citation below:
Nguyen, H., Shrestha, S., Draghici, S., & Nguyen, T. (2018). PINSPlus: a tool for tumor subtype discovery in integrated genomic data. Bioinformatics.