Springer Nature
Browse
13059_2023_3158_MOESM1_ESM.pdf (18.87 MB)

Additional file 1 of ChIPr: accurate prediction of cohesin-mediated 3D genome organization from 2D chromatin features

Download (18.87 MB)
journal contribution
posted on 2024-01-13, 04:40 authored by Ahmed Abbas, Khyati Chandratre, Yunpeng Gao, Jiapei Yuan, Michael Q. Zhang, Ram S. Mani
Additional file 1: Fig. S1. Predicted interactions correlate well with the original ones at the peak-level resolution. (A) Predicted interactions using the three variants of ChIPr for the cell lines K562, H1, and HepG2 correlate significantly better than the random interactions with the original ChIA-PET interactions of these three cell lines according to Spearman correlation coefficient. The predictions in (A) are obtained using models trained on the GM12878 cell line data. (B) Predicted interactions using the three variants of ChIPr for the cell lines GM12878, H1, and HepG2 correlate significantly better than the random interactions with the original ChIA-PET interactions of these three cell lines according to Spearman correlation coefficient. The predictions in (B) are obtained using the models trained on the K562 cell line data. ****: p-value < 0.0001, Wilcoxon rank sum test. Fig. S2. Predicted interactions correlate well with the original ones at the 5 Kbp bin resolution. (A and B) Comparison between the correlation coefficient values between the original interactions and the predicted ones using the three variants of ChIPr vs. those between the original and randomly generated ones for the four cell lines GM12878, K562, H1, and HepG2. The correlation coefficients were calculated using stratum adjusted correlation coefficients (A) and Pearson correlation coefficients (B), respectively. Predictions for cell lines GM12878, H1, and HepG2 were calculated using the models trained on K562 data. Predictions for the cell line K562 were calculated using the models trained on GM12878 data. Fig. S3. ChIPr predictions (DNN-ChIPr (A), RF-ChIPr (B), and GB-ChIPr (C)) capture Hi-C identified loops at significantly higher percentage than control loops. ****: p-value < 0.0001, Wilcoxon rank sum test. Fig. S4. The drop in mean absolute error when comparing predicted interactions with the original ones when training DNN-ChIPr while removing one of the input features at each time. The plot shows that removing H3K27ac and H3K27me3 together causes a relatively bigger drop in performance than removing each of them alone. Fig. S5. Comparison between the genome-level performance of minimal and full models of DNN-ChIPr (A and B) and GB-ChIPr (C and D). The models in (A and C) were trained on the data of GM12878 cell line. The models in (B and D) were trained on the data of K562 cell line. The data is split into training data (75%) and test data (25%). In (A and C), the performance of GM12878 is measured on the GM12878 test data. Similarly, in (B and D), the performance of K562 is also measured on the K562 test data. Fig. S6. The majority of RAD21 interactions have CTCF ChIP-seq binding in both of the two anchor peaks of the interactions in the four cell lines GM12878, K562, H1, and HepG2. The portion of interactions that misses CTCF ChIP-seq binding in the two anchor peaks is mostly enriched with enhancer-enhancer interactions. Fig. S7. RAD21 interactions without CTCF ChIP-seq binding in both peaks are significantly enriched with enhancer-enhancer interactions. (A and B) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in both peaks with those interactions with enhancer in both peaks for the H1 cell line (A), and simulations show that the intersection between the two sets of interactions is not significant (B). (C and D) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in only one peak with those interactions with enhancer in both peaks for the H1 cell line (C), and simulations show that the intersection between the two sets of interactions is statistically significant (D). (E and F) Venn diagram showing the intersection between RAD21 interactions with no CTCF binding in both peaks with those interactions with enhancer in both peaks for the H1 cell line (E), and simulations show that the intersection between the two sets of interactions is statistically significant (F). ****: p-value < 0.0001, empirical test. Fig. S8. RAD21 interactions without CTCF ChIP-seq binding in both peaks are significantly enriched with enhancer-enhancer interactions. (A and B) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in both peaks with those interactions with enhancer in both peaks for the HepG2 cell line (A), and simulations show that the intersection between the two sets of interactions is not significant (B). (C and D) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in only one peak with those interactions with enhancer in both peaks for the HepG2 cell line (C), and simulations show that the intersection between the two sets of interactions is statistically significant (D). (E and F) Venn diagram showing the intersection between RAD21 interactions with no CTCF binding in both peaks with those interactions with enhancer in both peaks for the HepG2 cell line (E), and simulations show that the intersection between the two sets of interactions is statistically significant (F). ****: p-value < 0.0001, empirical test. Fig. S9. The majority of RAD21 interactions have CTCF ChIP-seq binding in both of the two anchor peaks of the interactions in the three cell lines H9, MCF7, and LNCaP. The portion of interactions that misses CTCF ChIP-seq binding in the two anchor peaks is mostly enriched with enhancer-enhancer interactions. Fig. S10. RAD21 interactions without CTCF ChIP-seq binding in both peaks are significantly enriched with enhancer-enhancer interactions. (A and B) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in both peaks with those interactions with enhancer in both peaks for the H9 cell line (A), and simulations show that the intersection between the two sets of interactions is not significant (B). (C and D) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in only one peak with those interactions with enhancer in both peaks for the H9 cell line (C), and simulations show that the intersection between the two sets of interactions is statistically significant (D). (E and F) Venn diagram showing the intersection between RAD21 interactions with no CTCF binding in both peaks with those interactions with enhancer in both peaks for the H9 cell line (E), and simulations show that the intersection between the two sets of interactions is statistically significant (F). ****: p-value < 0.0001, empirical test. Fig. S11. RAD21 interactions without CTCF ChIP-seq binding in both peaks are significantly enriched with enhancer-enhancer interactions. (A and B) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in both peaks with those interactions with enhancer in both peaks for the MCF7 cell line (A), and simulations show that the intersection between the two sets of interactions is not significant (B). (C and D) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in only one peak with those interactions with enhancer in both peaks for the MCF7 cell line (C), and simulations show that the intersection between the two sets of interactions is statistically significant (D). (E and F) Venn diagram showing the intersection between RAD21 interactions with no CTCF binding in both peaks with those interactions with enhancer in both peaks for the MCF7 cell line (E), and simulations show that the intersection between the two sets of interactions is statistically significant (F). ****: p-value < 0.0001, empirical test. Fig. S12. RAD21 interactions without CTCF ChIP-seq binding in both peaks are significantly enriched with enhancer-enhancer interactions. (A and B) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in both peaks with those interactions with enhancer in both peaks for the LNCaP cell line (A), and simulations show that the intersection between the two sets of interactions is not significant (B). (C and D) Venn diagram showing the intersection between RAD21 interactions with CTCF binding in only one peak with those interactions with enhancer in both peaks for the LNCaP cell line (C), and simulations show that the intersection between the two sets of interactions is statistically significant (D). (E and F) Venn diagram showing the intersection between RAD21 interactions with no CTCF binding in both peaks with those interactions with enhancer in both peaks for the LNCaP cell line (E), and simulations show that the intersection between the two sets of interactions is statistically significant (F). ****: p-value < 0.0001, empirical test. Fig. S13. HOMER results for the top enriched motif (BORIS) and its top four best matches with known motifs in the locations of CTCF binding for the GM12878 cell line. Fig. S14. RAD21 interactions with CTCF ChIP-Seq binding in both peaks are significantly stronger than those with CTCF binding in one peak only or in none of the two peaks.

Funding

NIH Cancer Prevention and Research Institute of Texas U.S. Department of Defense

History