Study design and population
This study employed a cross-sectional design using data from the NHANES, a large-scale survey conducted by the National Center for Health Statistics (NCHS). We analyzed data from seven cycles collected between 2005 and 2016. The NHANES uses a complex, multistage probability sampling design to select participants representative of the civilian, non-institutionalized U.S. population. The study was approved by the Research Ethics Review Board of the NCHS, and all participants provided written informed consent. More information about NHANES can be found at https://www.cdc.gov/nchs/nhanes/index.htm.
Participant selection
From the initial NHANES dataset, we applied several exclusion criteria to define our study population. Participants were excluded if they had missing HPV infection status (n = 49,860), if we were unable to calculate their LE8 Score (n = 3,348), or if they had missing data for other key covariates (n = 955). After applying these criteria, our final analytical sample consisted of 6,773 participants. For more specific details, refer to Fig. 1.
Exposure assessment: LE8 score
The LE8 score is a validated and reliable instrument developed by the American Heart Association to measure cardiovascular health [15]. The LE8 is composed of 8 CVH indicators, which include 4 health factors ((BMI), non-HDL cholesterol, blood pressure, blood sugar) and 4 health behaviors (physical activity, diet, sleep duration, nicotine exposure).
The validity of the LE8 has been established through its strong association with cardiovascular disease outcomes and all-cause mortality in multiple large cohort studies and its reliability has been demonstrated through consistent results across different populations and settings [15, 16]. The detailed algorithm and scoring criteria for calculating the LE8 score have been published and validated. More details can be found in Supplementary Table 1 [15, 17]. For each of the 8 CVH indicators, scores range from 0 to 100, with high CVH scores between 80 and 100, medium CVH scores between 50 and 79, and low CVH scores between 0 and 49. The overall LE8 score is calculated as the arithmetic mean of the scores for the 8 indicators. In this study, we use the same definitions and cutoff points for classification.
The scoring criteria for each indicator are as follows: Diet quality is assessed based on percentiles of the Healthy Eating Index-2015 (HEI-2015), ranging from optimal (≥ 95th percentile, 100 points) to least optimal (1-24th percentile, 0 points). Physical activity is scored based on weekly moderate or vigorous activity duration, ranging from no activity (0 points) to ≥ 150 min per week (100 points). Nicotine exposure considers smoking status, time since quitting, and secondhand smoke exposure, ranging from current smokers (0 points) to never smokers (100 points). Sleep health is scored based on average nightly sleep duration, with 7–9 h being optimal (100 points). Body Mass Index (BMI) scores range from < 25 (100 points) to ≥ 40 (0 points). Blood lipids are scored based on non-HDL cholesterol levels, with < 130 mg/dL being optimal (100 points). Blood glucose scores combine diabetes history and HbA1c levels, with no diabetes and normal blood glucose being optimal (100 points). Blood pressure ranges from ideal levels (< 120/<80 mmHg, 100 points) to hypertension (≥ 160 or ≥ 100 mmHg, 0 points).
Dietary indicators were assessed using the HEI-2015, the construction and algorithm of which were provided by SAS code from the National Cancer Institute of the United States, and derived from combining dietary intake data collected from two 24-hour dietary recall questionnaires with the United States Department of Agriculture’s food pattern equivalents data [18]. For specific details, see Supplementary Table 2. Additionally, data on sleep duration, nicotine exposure, physical activity, medication history, diabetes history, and other information were obtained from self-reported questionnaires, with physical activity primarily capturing the duration and frequency of moderate to vigorous activities over the past 30 days. Data on blood sugar, blood lipids, and glycated hemoglobin levels were obtained from blood samples sent to the central laboratory. Information on height, weight, blood pressure (average of three consecutive measurements), and other parameters were obtained from physical examination reports.
Outcome assessment: HPV infection status
The specific process for HPV measurement includes: collecting vaginal cells from participants using a vaginal swab; The vaginal swab is then processed by professionals and sent to the Centers for Disease Control and Prevention in Atlanta, Georgia, for further analysis; HPV genotyping kits are measured and analyzed using methods such as the Roche Linear Array, Roche prototype line blot assay, and Digene hybridization capture method. More information about HPV measurement can be obtained from the official website at https://wwwn.cdc.gov/Nchs/Nhanes/2005-2006/HPVSWR_D.htm#LBDR31.
Covariates
To further accurately assess the relationship between the LE8 Score and HPV infection status, we adjusted for the following confounding variables: including age groups ([20,40), [40,59]), ethnicity (Mexican American, Non-Hispanic Black, Non-Hispanic White, Other Hispanic, Other Race – Including Multi-Racial), educational level (below high school, high school, college or above), Poverty Income Ratio (PIR) (< 1, 1–3, ≥ 3), drinking status (Former drinker (individuals who have previously drunk alcohol but have now stopped), Never drinker (individuals who have never consumed any alcoholic beverage), Mild drinker (daily alcohol consumption between 1 and 2 standard drink units), Moderate drinker (men with a daily consumption of no more than 4 standard drink units, women with a daily consumption of no more than 3 standard drink units), Heavy drinker (men with a daily consumption of more than 4 standard drink units, women with a daily consumption of more than 3 standard drink units). Lifetime sexual partners, defined as the number of men with whom the participant has had vaginal intercourse, as measured by the questionnaire item (In your lifetime, with how many men have you had vaginal sex?). These variables were selected based on their potential association with both the LE8 Score and HPV infection status, as suggested by previous literature and available NHANES data.
Statistical analyses
Considering the complex probability sampling characteristics of the NHANES database, we conducted weighted analyses as recommended by the database, taking into account two-day dietary interview weights (WTDR2D), primary sampling units (SDMVPSU), and strata (SDMVSTRA) to account for the complex design. Continuous variables are presented as weighted means (± standard deviation), with weighted t-tests used to describe statistical differences. Categorical variables are presented as percentages, with participants divided into HPV-infected and HPV-uninfected groups based on HPV infection status, and differences between groups described using weighted chi-square tests. Three logistic regression models were constructed to examine the association between LE8 Score and HPV infection status. Model 1 was unadjusted, while Model 2 adjusted for race/ethnicity, educational level, PIR, and drinking status. Model 3 further adjusted for lifetime sexual partners in addition to the variables included in Model 2. The Rao-Scott F test, an adjusted version of the chi-square test designed for complex sampling designs, was used to evaluate the robustness of the logistic regression model by assessing improvements in model fit with the inclusion of additional variables. This test effectively corrects assumption errors in standard chi-square tests, accounting for the complexities of survey data [19]. The variance inflation factor (VIF) was used to assess multicollinearity in the model. VIF measures the degree of correlation between independent variables, reflecting the strength of the linear relationship among them. Generally, a VIF value below 10 indicates low multicollinearity, while a value above 10 suggests significant multicollinearity [20].
The RCS models were used to investigate the dose-response relationship between LE8 Score and HPV infection status. RCS models capture nonlinear relationships by fitting separate linear models within different regions of the data (called “knots”). This approach provides flexibility and better modeling of complex dose-response relationships, which is advantageous compared to simple linear models. In applying the RCS models, we chose 3 to 5 knots, optimizing their positions using standard procedures such as minimizing the residual sum of squares [21]. The mathematical form of the RCS model is as follows:
$$y = beta {>_o} + sumlimits_{i = 1}^k {{beta _i}} fleft( {{x_i}} right) + {rm{varepsilon }}$$
Where y represents HPV infection status (outcome), (:{text{x}}_{text{i}}) represents the LE8 score (predictor), (:text{f}left({text{x}}_{text{i}}right)) is the restricted cubic spline function, ({{rm{beta }}_{rm{i}}}) are the regression coefficients, and ({rm{varepsilon }}) is the error term.
Besides, to examine the relationship between LE8 scores and HPV infection status across different populations, we conducted subgroup analyses and interaction tests. Age, race/ethnicity, education level, and PIR were selected as primary stratification variables based on their potential influence on HPV infection risk and immune response. Age was categorized into two groups: 20–40 years and 40–59 years and above. Race/ethnicity was classified according to NHANES standards, including Mexican American, non-Hispanic Black, non-Hispanic White, and other races (including multiracial). Education level was divided into three categories: high school and below, college, and graduate school and above. PIR was stratified into three ranges: <1, 1–3, and ≥ 3. Logistic regression models were employed to assess the relationship between LE8 scores and HPV infection status for each subgroup. Interactions between LE8 scores and each stratification variable were incorporated into the models to test for differences in these relationships across subgroups. An interaction term with a p-value < 0.05 was considered indicative of a significant interaction effect.
The WQS model was employed to evaluate the joint effects of multiple LE8 components on HPV infection risk, addressing multicollinearity by assigning weights to each component based on their contribution to the outcome. This approach is particularly useful for evaluating the combined effects of correlated variables [22]. The weighted model is as follows:
$$gleft( {rm{mu }} right) = {beta _o} + {beta _1}left( {sum {_{i = 0}^c} {omega _i}{varphi _i}} right) + {z^prime }Phi$$
$$WQS = sumlimits_{i = 1}^c {{{bar omega }_i}} {varphi _i}$$
(gleft( {rm{mu }} right)) represents any differentiable link function, (:{beta:}_{o}) denotes the intercept, and (:{beta:}_{1}) represents the regression coefficients, c denotes the number of LE8 indicators included in the model, (:{omega:}_{i}) represents the weighting index, with a range of (0≤(:{omega:}_{i})≤1), the sum of all weighting indices equals 1. (:{text{z}}^{{prime:}}) and Φ represent the matrix of covariates and the coefficients, respectively. (:{phi:}_{i}:)denotes the quartiles for each LE8 indicator, where ((:{phi:}_{i})=0,1,2,3) correspond to the 1st, 2nd, 3rd, or 4th quartile, respectively. (:sum:_{i=1}^{c}{overline{omega:}}_{i}{phi:}_{i}) represents the sum of the weighted quartiles of c indicators. Assuming a linear function fitting a Gaussian distribution, the data are randomly divided into a training set (60%) and a validation set (40%), and the weights of 8 LE8 indicators in the training set are estimated [23].
All statistical analyses and visualizations in this study were performed using R (version 4.4.1, https://www.r-project.org/), with two-sided statistical tests, and a P-value < 0.05 was considered statistically significant.