the Journal of Applied Research
in Clinical and Experimental Therapeutics

Vol 1 Issue 1

Vol 1 Issue 2

Back to The Journal of Applied Research

 

©2000-2001. All Rights Reserved. Therapeutic Solutions LLC

the Journal of Applied Research
in Clinical and Experimental Therapeutics

Current Issue

Volume 6 - 2006

Volume 5- 2005

Volume 4 - 2004

Volume 3 - 2003

Volume 2 - 2002

Volume 1 - 2001

Reprint Information

Back to The Journal of Applied Research

©2000-2005. All Rights Reserved. Therapeutic Solutions LLC

Click here for information on how to order reprints of this article.
Evaluation of Models for the Prediction of Breast Cancer Development in Women at High Risk

Evaluation of Models for the Prediction of Breast Cancer Development in Women at High Risk

 

Matthew S. Mayo, PhD

Kansas Cancer Institute and Department of Preventive Medicine

 

Bruce F. Kimler, PhD

Department of Radiation Oncology

 

Carol J. Fabian, MD

Division of Clinical Oncology, and Department of Internal Medicine

 

University of Kansas Medical Center

3901 Rainbow Boulevard

Kansas City, Kansas 66160-7312

 

KEY WORDS: Gail risk, fine needle aspiration, atypia, cytology, logistic regression, proportional hazards regression

 

abstract

In this manuscript we evaluate models for the prediction of breast cancer in women with major risk factors for the disease utilizing random periareolar fine needle aspiration (FNA) cytology along with the original and modified Gail risk assessment. Utilizing logistic regression, we compare the accuracy in prediction of breast cancer development using the original Gail risk compared to modifications suggested by Gail et al (1989) while also utilizing results from FNAs. The predictive ability of these factors for time to disease onset is compared using Cox's proportional hazards model. Modification of the traditional Gail risk and utilizing cytology obtained by FNAs results in improved logistic and Cox's proportional hazards regression models. Therefore, utilization of random periareolar FNA cytology in conjunction with the modified Gail risk assessment improves the short-term prediction of breast cancer in women at increased risk of the disease.

INTRODUCTION

Currently, it is estimated that the incidence of breast cancer in women is 111 per 100,000 women. Twenty-nine percent of those women diagnosed with breast cancer will succumb to the disease within 5 years.1 Breast cancer was traditionally the leading cause of cancer-related death among all women until it was surpassed by lung cancer in the 1980s. It continues to be the leading cause of cancer-related death among women aged 40 to 55. According the Surveillance, Epidemiology, and End Results (SEER) registry, the lifetime risk for a breast cancer diagnosis and death from breast cancer in women are 12.64% and 3.57%, respectively.2 Thus, there is a need for statistical models to accurately assess a woman's risk of breast cancer.

Gail and coworkers3 developed a model to estimate the relative risk of breast cancer in white women undergoing annual screening. They determined the major predictors of risk in this population were a family history of breast cancer in a first-degree relative, previous benign breast biopsies, a late age at first live birth, and early menarche. From these factors they created a model to estimate a woman's risk of breast cancer at 10, 20, and 30 years from her current age. They noted that the data used in the original model included women with and without atypical hyperplasia (AH). In their modified model, women with a prior biopsy showing AH have their relative risk multiplied by 1.82, resulting in a modified Gail risk at 10, 20, and 30 years from her current age. The Gail model does not include risk modification factors for prior breast cancer, age at breast cancer diagnosis, lobular carcinoma in situ, second- or third-degree relative with breast cancer, relatives with ovarian cancer, or hormone replacement therapy history, all factors previously shown to increase breast cancer risk. Thus, women with prior in situ or invasive cancer as their major risk factor, those with a strong paternal family history of breast cancer, or those from a hereditary breast ovarian family may have their risk substantially underestimated. The Gail model also does not take into account lifestyle changes that may be associated with risk reduction such as prophylactic oophorectomy in premenopausal women or prevention treatment with tamoxifen.

Recently, it has been suggested that tissue-based biomarkers are needed to enhance the prediction of short-term risk of breast cancer development.4-6 Candidate markers should be both biologically plausible and statistically associated with cancer or precancerous development.4 Potential surrogate endpoint biomarkers should also be (a) obtained from minimally invasive procedures, (b) easily quantifiable, (c) present at a reasonable rate in at-risk individuals, and (d) reversible with successful interventions.4-7

Nipple or fine-needle aspiration (FNA) are minimally invasive and inexpensive techniques that can be performed repeatedly with limited morbidity. Atypical cytology from nipple aspiration has been shown to be associated with increased breast cancer risk although approximately 40% of the aspirates are acellular.8,9 Random FNA is currently being evaluated as a technique for obtaining repeated breast tissue samples in risk prediction and chemoprevention clinical trials.7,10-14

We have demonstrated that random periareolar FNA cytology can be used in conjunction with the modified Gail risk assessment for the short-term prediction of breast cancer in women at high risk of breast cancer.15 In this article, we show that utilizing FNA data does enhance the prediction of breast cancer in women at high risk of breast cancer in comparison to the original and modified Gail risk assessment models. We detail the population in our cohort and then compare demographic and clinical variables between those women who have progressed to breast cancer and those who have not. We also define the models that are compared and discuss the results.

 

Population

Four hundred eighty women at increased risk for breast cancer because of a family history of breast cancer, prior precancerous biopsy, and/or prior invasive cancer were enrolled from August 1989 to January 1999. All women had a mammogram interpreted as not suspicious for breast cancer within 12 months prior to entry. Random periareolar FNAs were performed at entry on study, and cells were characterized cytologically as nonproliferative, epithelial hyperplasia, or epithelial hyperplasia with atypia.16 The average follow-up time for these women is 42.5 months, during which time 20 women have been subsequently diagnosed with invasive breast cancer of ductal carcinoma in situ. Detailed methodology regarding subject eligibility, FNA technique, tissue preparation and cytologic characterization have been previously published.11,17,18

Table 1 details the demographic, familial history, and random FNA cytologic characteristics of this population. From this data, we see that the average age is 44.31 years, with an average original 10-year Gail risk of 4.56% and an average modified 10-year Gail risk of 5.44%. Ninety-five point two percent of the patients are white, 59.6% of the women were premenopausal at entry and 83.5% were not on hormone replacement therapy at entry. Seventy-five point six percent of the women had at least one first-degree or two second-degree relatives with breast cancer, 22.5% had a prior precancerous mastopathy (AH or lobular carcinoma in situ), and 17.1% had prior breast cancer. This resulted in 14.2% of the women having multiple risk factors. FNAs determined that 21.2% of the women had epithelial hyperplasia with atypia, 70.6% had at least one positive biomarker, and 35.8% had evidence of multiple biomarker abnormalities.

{INSERT TABLE 1]

Table 2 compares characteristics between those women in whom breast cancer was subsequently clinically detected and those women who have not been subsequently clinically diagnosed with breast cancer. Continuous measures are compared using the two-sample t-test, and dichotomous variables are compared using Fisher's exact test.19 As can be seen there is not a significant difference in age, length of follow-up, race, menopausal status, hormone replacement therapy, incidence of one first-degree or two second-degree relatives with breast cancer, rate of prior breast cancer, at least one positive biomarker or multiple biomarker abnormality. However, as noted previously,15 both the original and modified 10-year Gail risks were significantly higher in those women who have subsequently developed breast cancer. Also, women with a prior precancerous mastopathy, with multiple risk factors, or with epithelial hyperplasia with atypia in their random periareolar FNA were more likely to develop breast cancer.

{INSERT TABLE 2}

 

Predicting Breast Cancer

In this paper we compare both logistic regression20,21 and Cox proportional hazards regression22,23 models for the prediction of breast cancer in our cohort. Three models will be compared: (1) 10-year Gail risk (Original), (2) 10-year Gail risk (Modified), and (3) model selected by stepwise procedure with a 5% significance to enter and leave the model. Both logistic regression and Cox proportional hazards regression models are fit using SAS software24 using PROC LOGISTIC and PROC PHREG, respectively.25,26 Each of these methods allows for performing stepwise procedures when given a set of explanatory variables.

{INSERT TABLE 3}

 

Logistic Regression Models

Logistic regression is a statistical modeling procedure that allows for the modeling of a categorical response variable based on a set of explanatory variables. In our circumstance, we used logistic regression to model the dichotomous response variable cancer. The logistic regression model can be written in the following form

where

is the probability of breast cancer,

is the intercept,

β1, ., βp are the p regression parameters,

x1, ., xp are the p explanatory variables.

Three models, given in Table 3, are compared for their ability to predict breast cancer. Model 1 is a simple logistic regression model using only Gail's original formulation of 10-year risk to predict breast cancer, 10-year Gail risk (Original). Model 2 is also a simple logistic regression model using only Gail's 10- year risk modified for AH to predict breast cancer, 10-year Gail risk (Modified). Model 3 was determined by stepwise logistic regression. The stepwise logistic procedure determined the best model and included two explanatory variables, epithelial hyperplasia with atypia from FNA (Atypia from FNA), and 10-year Gail risk (Modified).

When comparing logistic regression models, multiple tests and or statistics can be utilized.20,25 We will look at minimizing the -2Log Likelihood, maximizing the concordant percentage, and maximizing the area under the receiver operating characteristics (ROC) curve20 in determining the best model for the prediction of breast cancer. Table 4 details this information for the three models considered.

{INSERT TABLE 4}

As can be seen from Table 4, model 3 outperforms the other models on all three categories. The 10-year Gail risk (Modified) outperforms the 10-year Gail risk (Original). Model 2 is a subset of model 3 that can also be tested to determine if the addition of epithelial hyperplasia with atypia from FNA into the logistic regression provides a significant improvement.20,21 Subtracting the -2Log Likelihood of model 3 from the -2Log Likelihood from model 2 we get a one degree of freedom chi-square test that shows a significant improvement of model 3 over model 2. From Figure 1 we can see from the ROC curves that model 3 is clearly the best in terms of this criterion.

{INSERT FIGURE 1}

Table 5 gives the odds ratio and 95% confidence interval for the odds ratio for the explanatory variables in each of the three models considered. Table 5 also gives the P value associated with testing whether or not the corresponding parameters equal zero.20,21,25 The extremely high odds ratio associated with atypia from FNA is consistent with it being the first explanatory variable to enter in the stepwise regression procedure. An increase of over fivefold in the odds ratio not only makes this a highly statistical significant predictor of breast cancer but also a clinically significant predictor as well.

{INSERT TABLE 5}

Cox Proportional Hazards Regression

Cox developed the proportional hazards regression model to allow for use of explanatory variables in predicting a time-to-event response. The model may be expressed as

where

is the hazard for the ith individual at time t,

is the nonnegative baseline hazard function,

β1, ., βp are the p regression parameters,

x1, ., xp are the p explanatory variables.

We compare the same three models as in the previous section, but now the response variable is time to breast cancer diagnosis. Table 6 gives the -2Log likelihood for each of the three models as well as the P value for the likelihood ratio Chi-square test.23, 26 As was the case with logistic regression models, model 3 is the best for the prediction of time to breast cancer diagnosis.

{INSERT TABLE 6}

Table 7 gives the hazard ratio for each of the variables along with the corresponding 95% confidence intervals. Table 7 also includes the P value for testing whether or not the regression parameter associated with each explanatory variable(s) in the models is equal to zero.23,26 These results mimic those of the logistic regression models in the previous subsection. Again, the extremely high hazard ratio associated with Atypia from FNA is consistent with it being the first variable entered in the stepwise procedure. These results further enhance the use of FNA to aid in the prediction of breast cancer in women.

{INSERT TABLE 7}

 

Conclusion

The utilization of cytologic information from random periareolar FNAs, especially epithelial hyperplasia with atypia, enhances the ability to predict breast cancer in women with major risk factors for breast cancer. In this cohort of women, using Gail's 10-year risk assessment modified for AH along with epithelial hyperplasia with atypia from random periareolar FNA provides the best prediction models for breast cancer development and time to breast cancer development. The model seems robust since the stepwise procedure for both the logistic and Cox proportional hazards regression model use the same explanatory variables.

It should be noted that this is a single cohort of women at a single institution and multi-institutional studies should be performed. Further follow-up on this cohort, which will reveal more breast cancer incidents, will allow us to re-evaluate these models and determine if other factors may play a role in the prediction of breast cancer.

 

REFERENCES

1.      American Cancer Society: Cancer Facts and Figures-2000. Atlanta, Georgia, American Cancer Society Incorporated, 2000.

2.      Miller BA: Racial/ethnic patterns of cancer in the United States 1988-1992. Surveillance, Epidemiology, and End Results (SEER) Monograph. Bethesda, MD, National Cancer Institute, 1996.

3.      Gail MH, Brinton LA, Byar DP, et al: Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879-1886, 1989.

4.      Freedman LS, Schatzkin A, Shiffman MH: Statistical validation of intermediate markers of precancer for use as endpoints in chemoprevention trials. J Cellular Biochem 16(Supplement G):27-32, 1992.

5.      Kelloff GJ, Boone CW, Steele VE, et al. Mechanistic considerations in chemopreventive drug development. J Cellular Biochem 20(Supplement G): 1-24, 1994.

6.      Kelloff GJ, Boone CW, Crowell JA, et al: Risk biomarkers and current strategies for cancer chemoprevention. J Cellular Biochem 25:1-14, 1996.

7.      Fabian CJ, Kimler BF, Elledge RM, et al: Models for early chemopreventions trials in breast cancer. Hematol/Oncol Clin North Am 12:993-1017, 1998.

8.      Wrensch M, Petrakis NL, King EB, et al: Breast cancer risk associates with abnormal cytology in nipple aspirates of breast fluid and prior history of breast biopsy. Am J Epidemiol 137:829-833, 1993.

9.      Sauter ER, Ross E, Daly M, et al: Nipple aspirate fluid: A promising non-invasive method to identify cellular markers of breast cancer risk. Br J Cancer 76(4):494-501, 1997.

10. Fabian CJ, Kamel S, Kimler BF, McKittrick R: Potential use of biomarkers in breast cancer risk assessment and chemoprevention trials. Breast J 1:236-242, 1995.

11. Fabian CJ, Zalles C, Kamel S, et al: Breast cytology and biomarkers obtained by random fine needle aspiration: Use in risk assessment and early chemoprevention trials. J Cellular Biochem Suppl 28-29:101-110, 1997.

12. Khan SA, Masood S, Miller L, Numann P: Occult epithelial proliferation of the breast detected by random FNA. Proc Am Assoc Cancer Res 37:251, 1996.

13. Marshall CJ, Schumann GB, Ward JH, et al: Cytologic identification of clinically occult proliferative breast disease in women with a family history of breast cancer. Am J Clin Pathol 95:157-165, 1991.

14. Martino S, Ensley JF, Weaver D, et al: Cellular DNA content characteristics of needle aspirates from patients at high-risk for developing breast cancer. Proc Am Assoc Cancer Res 30:256, 1989.

15. Fabian CJ, Kimler BF, Zalles CM, et al: Improved prediction of breast cancer risk based on random periareolar fine needle aspiration cytology. J Natl Cancer Inst 92(15):1217-1227, 2000.

16. Zalles C, Kimler BF, Kamel S, et al: Cytologic patterns in random aspirates from women at high and low risk for breast cancer. Breast J 1:343-349, 1995.

17. Fabian CJ, Zalles C, Kamel S, et al: Biomarker and cytologic abnormalities in women at high and low risk for breast cancer. J Cellular Biochem 17(Suppl G):153-160, 1993.

18. Fabian DJ, Zalles C, Kamel S, et al: Prevalence of aneuploidy, overexpressed ER, and overexpressed EGFR in random breast aspirates of women at high risk and low risk for breast cancer. Breast Cancer Res Treatment 30:263-274, 1994.

19. Lehmann EL: Testing Statistical Hypotheses, ed 2. New York, Chapman & Hall, 1994.

20. Agresti A: Categorical Data Analysis. New York, Wiley, 1990.

21. Zelterman D: Models for Discrete Data. Oxford, Clarendon Press, 1999.

22. Cox DR: Regression models for life tables. J Royal Statistical Soc 34:187-220, 1972.

23. Lee ET: Statistical Methods for Survival Data Analysis. New York, Wiley, 1992.

24. SAS: The SAS System for Windows, Release 8.00. Cary, North Carolina, SAS Institute Incorporated, 2000.

25. Stokes ME, Davis CS, Koch GG: Categorical Data Analysis Using the SAS System. Cary, North Carolina, SAS Institute Incorporated, 1995.

26. Allison PD: Survival Analysis Using the SAS System: A Practical Guide. Cary, North Carolina, SAS Institute Incorporated, 1995.

 


Table 1: Demographics of 480 High-Risk Breast Cancer Subjects*

Age

44.31 (8.59)

10-Year Gail Risk (Original)

4.56 (3.58)

10-Year Gail Risk (Modified)

5.44 (4.69)

Follow-up in Months

42.53 (29.68)

Race

 

White (Nonhispanic)

457 (95.2)

Other

23 (4.8)

Menopausal Status at Entry

 

Pre

286 (59.6)

Post

194 (40.4)

On Hormone Replacement Therapy at Entry

 

No

401 (83.5)

Yes

79 (16.5)

At Least One First or Two Second-Degree Relatives with Breast Cancer

 

No

117 (24.4)

Yes

363 (75.6)

Prior Precancerous Mastopathy

 

No

372 (77.5)

Yes

108 (22.5)

Prior Breast Cancer

 

No

398 (82.9)

Yes

82 (17.1)

Multiple Risk Factors

 

No

412 (85.8)

Yes

68 (14.2)

Hyperplasia with Atypia from FNA

 

No

378 (78.8)

Yes

102 (21.2)

At Least One Positive Biomarker from FNA

 

No

141 (29.4)

Yes

339 (70.6)

Evidence of Multiple Biomarker Abnormality from FNA

 

No

308 (64.2)

Yes

172 (35.8)

Cancer other than LCIS

 

No

460 (95.8)

Yes

20 (4.2)

*Data are summarized as mean (standard deviation) for continuous variables and n (%) for dichotomous variables.

FNA = fine needle aspiration.

LCIS = lobular carcinoma in situ


Table 2: Comparison of Characteristics Between Women Who Have been Subsequently Diagnosed with Breast Cancer (Cancer) and Those Women Who Have Not (Without Cancer)*

 

Variable

With Cancer

(n=20)

Without Cancer

(n=460)

P Value

Age

46.35 (7.89)

44.22 (8.62)

.2521

10-Year Gail Risk (Original)

6.96 (4.40)

4.46 (3.51)

.0208

10-Year Gail Risk (Modified)

9.26 (6.27)

5.27 (4.54)

.0108

Follow-up in Months

43.54 (24.54)

42.48 (29.90)

.8532

Race

 

 

1.0000

White Non-Hispanic

19 (95.0)

438 (95.2)

 

Other

1 (5.0)

22 (4.8)

 

Menopausal Status at Entry

 

 

.3638

Pre

14 (70.0)

272 (59.1)

 

Post

6 (30.0)

188 (40.9)

 

On Hormone Replacement Therapy at Entry

 

 

.7564

No

16 (80.0)

385 (83.7)

 

Yes

4 (20.0)

75 (16.3)

 

At Least One First-Degree or Two Second-Degree Relatives with Breast Cancer

 

 

 

.7937

No

4 (20.0)

113 (24.6)

 

Yes

16 (80.0)

347 (75.4)

 

Prior Precancerous Mastopathy

 

 

.0054

No

10 (50.0)

362 (78.7)

 

Yes

10 (50.0)

98 (21.3)

 

Prior Breast Cancer

 

 

.2228

No

19 (95.0)

379 (82.4)

 

Yes

1 (5.0)

81 (17.6)

 

Multiple Risk Factors

 

 

.0143

No

13 (65.0)

399 (86.7)

 

Yes

7 (35.0)

61 (13.3)

 

Hyperplasia with Atypia from FNA

 

 

.0001

No

8 (40.0)

370 (80.4)

 

Yes

12 (60.0)

90 (19.6)

 

At Least One Positive Biomarker from FNA

 

 

.2101

No

3 (15.0)

138 (30.0)

 

Yes

17 (85.0)

322 (70.0)

 

Evidence of Multiple Biomarker Abnormality from FNA

 

 

.2328

No

10 (50.0)

298 (64.8)

 

Yes

10 (50.0)

162 (35.2)

 

*Data are summarized as mean (standard deviation) for continuous variables and n (%) for dichotomous variables. Continuous variables are compared via the two-sample t-test and dichotomous variables are compared by Fisher's exact test.

FNA = fine needle aspiration.


Table 3: Models for Prediction of Breast Cancer Development and Time to Breast Cancer Development

Model

Variable(s)

1

10-Year Gail Risk (Original)

2

10-Year Gail Risk (Modified)

3*

10-Year Gail Risk (Modified) + Atypia from FNA

*Model 3 was determined to be best by stepwise logistic and stepwise Cox's proportional hazards regression.

FNA = fine needle aspiration.

 

 

Table 4: Performance of Logistic Regression Models for Prediction of Breast Cancer

 

Model

 

-2Log Likelihood

Likelihood Ratio

Chi-square (P Value)

 

% Concordant

 

Area under ROC Curve

1

159.38

6.90 (.0086)

65.5

0.678

2

156.84

9.43 (.0021)

72.0

0.741

3

145.24

21.03 (<.0001)

79.0

0.797

ROC = receiver operating characteristics.

 

Table 5: Summary of Logistic Regression Models for Prediction of Breast Cancer

Model

Variable

Odds Ratio (95% CI)

P Value

1

10-Year Gail Risk (Original)

1.137 (1.036, 1.237)

.0038

2

10-Year Gail Risk (Modified)

1.114 (1.043, 1.184)

.0006

3

10-Year Gail Risk (Modified)

1.094 (1.021, 1.167)

.0075

Atypia from FNA

5.176 (2.030, 13.788)

.0006

CI = confidence interval; FNA = fine needle aspiration.

 

Table 6: Performance of Cox Proportional Hazard Regression Models for Prediction of Time to Breast Cancer Diagnosis

Model

-2Log Likelihood

Likelihood Ratio Chi-Square (P Value)

1

206.19

9.47 (.0021)

2

204.06

11.60 (.0007)

3

191.85

23.81 (<.0001)

 

Table 7: Summary of Cox Proportional Hazard Regression Models for Prediction of Time to Breast Cancer Diagnosis

Model

Variable

Hazard Ratio (95% CI)

P Value

1

10-Year Gail Risk (Original)

1.157 (1.071, 1.249)

.0002

2

10-Year Gail Risk (Modified)

1.118 (1.061, 1.178)

<.0001

3

10-Year Gail Risk (Modified)

1.099 (1.040, 1.162)

.0009

Atypia from FNA

5.087 (2.041, 12.679)

.0005

CI = confidence interval; FNA = fine needle aspiration.