Springer Nature
Browse
1/1
2 files

Metadata supporting data files of the related article: Metabolomic analysis of serum may refine 21-gene expression assay risk recurrence stratification

dataset
posted on 2019-09-09, 10:18 authored by Amelia McCartney, Alessia Vignoli, Leonardo Tenori, Monica Fornier, Lorenzo Rossi, Emanuela Risi, Claudio Luchinat, Laura Biganzoli, Angelo Di Leo
The study sought to refine the Oncotype DX risk score by metabolomics analysis of serum derived from patients with early breast cancer (eBC). The analytical technique used to perform the metabolomics analysis was Nuclear Magnetic Resonance (NMR) spectroscopy. Serum samples from 87 patients with oestrogen and progesterone receptor positive, Human Epidermal Growth Factor Receptor 2 (HER2)-negative early breast cancer were analysed via NMR spectroscopy. Using this clinical dataset, the risk of recurrence was further sub-stratified by metabolomic signature, with an effective splitting of each Oncotype risk classification.

Data access: Analysed metabolomics datasets used to derive figures 1 and 2 of the published article are publicly available in this figshare data record. Raw NMR spectra data (raw metabolomics data) supporting figures 1 and 2 of the published article can be accessed from the corresponding author on reasonable request. Clinical (patient) data (supporting supplementary table 1 of the published article) are not publicly available but can be accessed from the corresponding author on reasonable request. Corresponding author details: Dr. Angelo Di Leo, “Sandro Pitigliani” Medical Oncology Department, Hospital of Prato, Via Suor Niccolina 20, 59100 Prato Italy, email address: angelo.dileo@uslcentro.toscana.it

Study design and methodology: The study aimed to couple NMR metabolomic predictions of recurrence with OncotypeDX recurrence scores in order to test whether metabolomic prediction of risk recurrence could refine the OncotypeDX recurrence score stratifications.
The serum samples collected from the 87 patients with oestrogen and progesterone receptor positive, HER2-negative eBC and subsequently analysed via NMR spectroscopy, were collected post-operatively, between June 2007 and December 2009, with a mean follow-up from diagnosis of 7 years (range, 1-9 years). NMR spectra were compared with a matched population of 28 metastatic breast cancer (mBC) patients, previously analysed. To build a statistical model to predict the recurrence risk in early breast cancer patients, 26 samples from patients with previous, recurrence-free eBC, and all mBC patients were compared using a Random Forest (RF) classifier. The RF classifier uses the set of samples collected from patients with mBC to serve as a control, against which samples from patients with eBC are compared. The “RF risk score” is based on the percentage of trees in the ensemble that mis-classify an early disease sample as belonging to the control group, thus expressing the extent to which the serum metabolomic profile of samples from early disease appear to be similar to the profiles of confirmed metastatic controls. This model was tested by analysing all remaining eBC patients (validation set: 54 relapse-free patients, and seven with relapse).
The metabolomic RF risk score was then combined with the predictive strength of the OncotypeDX assay to test whether the metabolomic score could sub-stratify each Oncotype-defined risk class into two subgroups: low and high risk.
Statistical analyses were performed using the open source software R, described in detail in the supplementary methods of the published article. For details on the 1H-NMR methodology that was used to analyse the patient samples, please refer to the supplementary methods of the published article.

Patient consent: All patients provided prospective informed consent for collection of serum and clinical data for the purposes of future studies. Consent was obtained according to a protocol approved by the ethics committee of the Memorial Sloan Kettering Cancer Center.

Dataset descriptions:
Datasets Figure_1.xlsx and Figure_2.xlsx supporting Figures 1 and 2 respectively in the published article, are partially analysed datasets based on the raw NMR spectra data (the Random Forest score was generated through analysis of the raw spectra data).

Data supporting figure 1: Figure_1.xlsx is in .xlsx file format and was used to generate figure 1 in the published article. The file consists of three tables “Panel A”, “Panel B” and “Panel C”. The MSKCC_X identifiers correspond to a serum sample from a breast cancer patient.
Panel A shows the data of the metabolomic training set: These data were used to generate the area under the receiver operating characteristics curve (AUC) for the random forest (RF) model, which compares 26 patients with previous early breast cancer (without recurrence) against 28 patients with metastatic disease. The random forest score (RF score) expresses the probability that each early breast cancer sample included in the model has been correctly classified, and not mis-identified as a metastatic sample.
Panel B shows the data of the metabolomic validation set. Same technique was employed as in Panel A, this time analysing the 7 patients with early breast cancer who developed recurrence, compared to 54 early breast cancer patients without recurrence. A high random forest score is implied to suggest a high risk of recurrence, because it corresponds to a metabolomic fingerprint that closely resembles a metastatic fingerprint (as seen in the training set).
Panel C shows the data used to generate the Kaplan Meier curve in figure 1 by examining all the studied patients with early breast cancer, and comparing the metabolomic recurrence risk scores of those with "high" risk of recurrence versus those with "low" risk in relation to disease-free survival. This panel illustrates disease-free survival over years ("Time" column in the table) - those patients with "high" risk had a significantly shorter disease-free survival.

Data supporting figure 2: Figure_2.xlsx is in .xlsx file format and shows the metabolomics random forest scores and corresponding genomic assay results used to generate figure 2 in the published article. The data consists of a table with six columns and six different parameters. These are: MSKCC_X identifiers correspond to a serum sample from a breast cancer patient; OncotypeDx: score of the OncotypeDx score. OncotypeDx is a test is a genomic test that analyses the activity of a group of genes that can affect how a breast cancer is likely to behave and respond to treatment; TaylorDx class: TAILORx-defined recurrence score classification (low/intermediate/high). TAILORx is a phase 3 clinical trial, opened in 2006 and was designed to provide an evidence-based answer to the question of whether hormone therapy alone is not inferior to hormone therapy plus chemotherapy; Rf score: NMR metabolomic Random Forest risk score, Rf≥53 (1 no, 2 yes): 1 was assigned if the Rf score was less or equal to 53; 2 was assigned if the Rf score was greater or equal to 53. Relapse (1 no, 2 yes): 1 was assigned to relapse-free patients; 2 was assigned to relapsed patients.

Data supporting supplementary table 1: Figure_3.xlsx is in .xlsx file format and includes the baseline characteristics at initial diagnosis of patients included in the early disease group metabolomic analysis.



Funding

This work was supported by a grant from the Breast Cancer Research Foundation, New York USA (grant number BCRF 18-054, jointly to LB and MF). AV is supported by an AIRC fellowship for Italy. The authors acknowledge the support and the use of resources of Instruct-ERIC, a landmark ESFRI project, and specifically the CERM/CIRMMP Italy Centre.

History

Research Data Support

Research data support provided by Springer Nature