Springer Nature
Browse
Mathews et. al.xlsx (10.59 kB)

Metadata supporting data files of the related article: Robust and Interpretable PAM50 Reclassification Exhibits Survival Advantage for Myoepithelial and Immune Phenotypes

Download (10.59 kB)
online resource
posted on 2019-09-09, 10:54 authored by James C. Mathews, Saad Nadeem, Arnold J. Levine, Maryam Pouryahya, Joseph O. Deasy, Allen Tannenbaum

The study introduced a classification of breast tumors into 7 classes which are more clearly defined by interpretable mRNA signatures along with the PAM50 gene set than the 5 traditional PAM50 intrinsic subtypes.


The authors reclassified the breast cancer subtypers of the PAM50/Prosigna Risk of Recurrence (for breast cancer prognostication) assay using topological data analysis and incorporating prior knowledge of biological phenotype (basal/luminal stratification).


Data access:

The datasets generated and analysed during the current study (as listed in Table Mathews et. al.xlsx) and used to derive the figures and tables in the published article are publicly available in GitHub and the GTEx portal as described in Mathews et. al.xlsx. The study used three publicly available datasets and the raw data can be accessed from cBioPortal at https://identifiers.org/cbioportal:brca_metabric (METABRIC data) and at https://identifiers.org/cbioportal:brca_tcga_pan_can_atlas_2018 (TCGA data). The raw normal breast data can be accessed from the GTEx portal at https://gtexportal.org/home/datasets.


Study aims and methodology:

The study aimed to reclassify the PAM50 intrinsic phenotypes into subtypes that are accurately defined by clear patterns of activation and inactivation of gene groups directly interpretable in ter ms of specific normal mammary cell types: basal, luminal/estrogen receptor (ER), myoepithelial and human epidermal growth factor receptor 2 (Her2)-related gene groups. This will enable more accurate prognostication of breast cancer.

Topological Data Analysis (TDA) was performed using three published datasets: TCGA, METABRIC and GTEx. The 1082 TCGA and 1904 METABRIC mRNA expression z-score data sets along with the PAM50 gene set were retrieved from cBioPortal (see Mathews et. al.xlsx for links to datasets). The 290 GTEx normal breast data set were downloaded from the GTEx portal. More details on the methodology and mapper-based unsupervised analysis of high-dimensional point clouds are described in the published article and its supplementary information file.


Dataset description:

Datasets analysed during the current study and used to derived the figures and tables in the published article are described in Mathews et. al.xlsx. The file includes the name of data files, data file formats and links to individual datasets.


Data supporting figure 2 show Z-scores of the METABRIC dataset organised by PAM50 subtype and by TDA signature class (.csv file), Mapper-derived classifier along the PAM50 gene set (.png file), median mRNA level Z scores of the marker genes expressed in the PAM50 subtypes (.txt).


Data supporting figure 3 show TDA signature class, PAM50 subtype, number of disease-free months and disease-free status in the TCGA dataset (tcga csv file), TDA signature class, PAM50 subtype, survival months and survival status in the METABRIC dataset (metabric csv file), clinical TCGA patient data (bcr clinical data txt file) and clinical METABRIC patient data (data_clinical_patient.txt file)


Data supporting figure 4 show TDA signature class, PAM50 subtype, survival months and survival status in the METABRIC dataset.


Data supporting figure 5 show Z scores of the marker genes from the Mapper analysis of the GTEx dataset (gtex csv file), Z scores of the expression of marker genes of Santagata et al (gtex santagata csv file) and Transcripts Per Kilobase Million (TPM) by tissue in the GTEx dataset (RNASeQ gct file).


Data supporting figure 6 show TDA signature class, PAM50 subtype, survival months and survival status in the METABRIC dataset.


Data supporting figure 7 show Z scores of the marker genes from the Mapper analysis of the METABRIC dataset (.csv file) and the Mapper-derived classifier along the PAM50 gene set (.png file).


Dataset supporting table 1 consist of a list of genes for reclassification of PAM50 subtypes of breast tumors.


Funding

This study was supported by AFOSR grant (FA9550-17-1-0435), NIA grant (R01-AG048769), MSK Cancer Center Support Grant/Core Grant (P30 CA008748), and a grant from Breast Cancer Research Foundation (grant BCRF-17-193)

History

Research Data Support

Research data support provided by Springer Nature