Additional file 1: of An expansion of rare lineage intestinal microbes characterizes rheumatoid arthritis

Statistical analysis. Figure S1. Body mass index (BMI) was associated with decreased species richness in the gut microbiota of RA patients. Species richness was measured by the observed OTU numbers and calculated on the rarefied counts. The dashed line shows the fitted linear regression line with the gray area indicating the 95 % confidence band. The three horizontal lines of the box represent the first, second (median), and third quartiles, respectively, with the whisker extending to 1.5 inter-quartile range (IQR). BMI-1 ≤ 24, 2 ≤ 30, 3 ≤ 35, 4 ≤ 40 Figure S2. Treatment effects (methotrexate, hydroxychloroquine, or either) on microbiota α-diversity, stratified by prednisone use. N not treated with specific drug, Y treated. The three horizontal lines of each box represent the first, second (median), and third quartile, respectively, with the whisker extending to 1.5 inter-quartile range. n = 40. Figure S3. Heat map showing the genus-level profiles of the gut microbiota of RA, their first-degree relatives (FDR) and healthy controls (HC). The heat map colors indicate the abundance of the genera. Figure S4. Boxplots comparing the inter-group UniFrac distances (RA versus first degree relatives (FDR), RA versus controls and FDRs versus healthy controls (HC)). The distances between relatives and HCs are smaller than those between RA and relatives or HCs, indicating disease may have stronger effects on the gut microbiota than genetic or environmental factors. Figure S5. Principal coordinate plots based on unweighted and weighted UniFrac distances. PERMANOVA analysis showed a significant difference between RA patients and controls on unweighted UniFrac (a) instead of weighted UniFrac (b), indicating the microbiota change in RA mostly occurs in the rare and less abundant lineages. The percentage of variability explained by the corresponding coordinate is indicated on the axis. Each point represents a sample with red and blue color indicating the RA and control groups, respectively. The lines connecting to the centroid and the ellipses do not represent any statistical significance but rather serve a visual guide to group differences. Figure S6. Comparison of the relative abundances of differentially abundant taxa between RA and controls selected based on a false discovery rate of 15 %. Error bars represents the standard error of the mean. The y-axis is on squared-root scale. Figure S7. The relative abundance of Prevotella copri does not show significant difference between RA and controls. a The presence of P. copri OTUs was similar between RA and controls. The row names of the heat map depict the P. copri OTUs. Red color indicates the presence of the OTUs. b The relative abundance of the P. copri OTUs is similar between RA patients and controls. c The presence and abundance of P. copri does not correlate with the presence or absence of HLA-DR4 in RA patients. Figure S8. The RA gut microbiota has decreased function in amino acid metabolism. The abundance of the KEGG pathway categories was calculated based on PICRUSt. The three horizontal lines of the box represent the first, second (median), and third quartile, respectively, with the whisker extending to the 1.5 inter-quartile range (IQR). Figure S9. The predictive power of gut microbiota profile (species) for RA status assessed by machine learning algorithm random forests. Random forests, an ensemble classifier built upon many decision trees, was used to build a prediction model based on the OTU-level relative abundances. a Comparison of the classification error of the random forests-trained model with guessing, which always predicts the class label based on the majority class in the training data set. The box plots are based on the results from 200 bootstrap samples. Random forests achieves significantly lower classification error. b Predictive power of individual OTUs assessed by the Boruta feature selection algorithm. Deep blue box plots correspond to the maximum importance Z score of shadow genera, which are shuffled versions of real genera introduced to the random forests classifier and provide a benchmark to detect truly predictive genera. Yellow and light blue colors show the tentative and confirmed genera by the Boruta selection. c Heat map based on the abundance ranks of the three Boruta-confirmed OTUs. Red and blue colors indicate high and low abundance, respectively. Hierarchical clustering (Euclidean distance, complete linkage) shows that RA samples tend to cluster together. Figure S10. Scatterplots showing the correlation of the abundance of differentially abundant metabolites with Collinsella abundance. Significance was assessed by Spearman rank correlation test. The blue line shows the fitted linear regression with the gray area indicating the 95 % confidence band. Figure S11. Collinsella does not colonize the gut. Fecal pellets collected before mice were gavaged with C. aerofaciens and at various time points (3, 6, 12, 24, and 48 h) after gavage were collected and PCR was done to determine the presence of the microbe. DNA from C. aerofaciens was used as a positive control and culture media as a negative control. After 12 h of gavage, a very faint band was observed and after 24 h, C. aerofaciens was not detectable in fecal pellets, suggesting the microbe did not colonize the intestine. Figure S12. E. coli does not significantly alter the gut permeability. E coli was used as a control microbe for gut permeability. a Gut permeability in DQ8 mice administered E. coli or media did not show significant change. Sera of mice were tested for FITC-Dextran before and after treating mice with E. coli for 3 weeks (P = not significant, n = 6 mice/group). b CACO-2 cells cultured with or without E. coli stained with ZO-1 showed no significant difference in the expression of tight junction protein. c Quantification of the mean fluorescence intensity of ZO-1 expression in CACO-2 cells cultured in the presence of E. coli or media (P = not significant). (PDF 2660 kb)