This summary covers the topic of breast cancer screening and includes information about breast cancer incidence and mortality, risk factors for breast cancer, the process of breast cancer diagnosis, and the benefits and harms of various breast cancer screening modalities. This summary also includes information about screening among special populations.
Mammography is the most widely used screening modality, with solid evidence of benefit for women aged 40 to 74 years. Clinical breast examination and breast self-exam have also been evaluated but are of uncertain benefit. Technologies such as ultrasound, magnetic resonance imaging, tomosynthesis, and molecular breast imaging are being evaluated, usually as adjuncts to mammography.
Based on solid evidence, screening mammography may lead to the following benefit:
Based on solid evidence, screening mammography may lead to the following harms:
For all these potential harms of screening mammography, internal validity, consistency and external validity are good.
Clinical breast examination (CBE) has not been tested independently; it was used in conjunction with mammography in one Canadian trial, and was the comparator modality versus mammography in another trial. Thus, it is not possible to assess the efficacy of CBE as a screening modality when it is used alone versus usual care (no screening activity).
Screening by CBE may lead to the following harms:
Breast self-examination (BSE) has been compared with usual care (no screening activity) and has not been shown to reduce breast cancer mortality.
Based on solid evidence, formal instruction and encouragement to perform BSE leads to more breast biopsies and diagnosis of more benign breast lesions.
Breast cancer is the most common noncutaneous cancer in U.S. women, with an estimated 63,410 cases of in situ disease, 252,710 new cases of invasive disease, and 40,610 deaths expected in 2017.  Thus, fewer than 1 of 6 women diagnosed with breast cancer die of the disease. By comparison, about 71,280 American women are estimated to die of lung cancer in 2017.  Males account for 1% of breast cancer cases and breast cancer deaths (refer to the Special Populations section of this summary for more information).
Widespread adoption of screening increases breast cancer incidence in a given population and changes the characteristics of cancers detected, with increased incidence of lower-risk cancers, premalignant lesions, and ductal carcinoma in situ (DCIS). (Refer to the Ductal Carcinoma In Situ section in the Breast Cancer Diagnosis and Pathology section of this summary for more information.) Ecologic studies from the United States  and the United Kingdom  demonstrate an increase in DCIS and invasive breast cancer incidence since the 1970s, attributable to the widespread adoption of both postmenopausal hormone therapy and screening mammography. In the last decade, women have refrained from using postmenopausal hormones, and breast cancer incidence has declined, but not to the levels seen before the widespread use of screening mammography. 
One might expect that if screening identifies cancers before they cause clinical symptoms, then the period of screening will be followed by a period of compensatory decline in cancer rates, either in annual population incidence rates or in incidence rates in older women. However, no compensatory drop in incidence rates has ever been seen following the adoption of screening, suggesting that screening leads to overdiagnosis—the identification of clinically insignificant cancers (refer to the Overdiagnosis section in the Harms of Screening section of this summary for more information).
Breast cancer incidence and mortality risk also vary according to geography, culture, race, ethnicity, and socioeconomic status (refer to the Special Populations section of this summary for more information).
Women with breast symptoms are not candidates for screening because they require a diagnostic evaluation. During a 10-year period, 16% of 2,400 women aged 40 to 69 years sought medical attention for breast symptoms at their health maintenance organization.  Women younger than 50 years were twice as likely to seek evaluation. Additional testing was performed in 66% of these women, including invasive procedures performed in 27%. Cancer was diagnosed in 6.2% of these women, most often as stage II or stage III. Of the breast symptoms prompting medical attention, a mass was most likely to lead to a cancer diagnosis (10.7%) and pain was least likely (1.8%) to do so.
Breast cancer risk is affected by many factors besides participation in screening activities. Understanding and quantifying these risks is important to a woman, to her physicians, and to public policy makers. (Refer to the PDQ summary on Breast Cancer Prevention for a complete description of factors associated with an increased or decreased risk of breast cancer.)
Breast cancer is most often diagnosed by pathologic review of a fixed specimen of breast tissue. The breast tissue can be obtained from a symptomatic area or from an area identified by an imaging test. A palpable lesion can be biopsied with core needle biopsy or, less often, fine-needle aspiration biopsy or surgical excision; image guidance improves accuracy. Nonpalpable lesions can be sampled by core needle biopsy using stereotactic x-ray or ultrasound guidance or can be surgically excised after image-guided localization. In a retrospective study of 939 patients with 1,042 mammographically detected lesions who underwent core needle biopsy or surgical needle localization under x-ray guidance, sensitivity of core needle biopsy for malignancy was greater than 95% and the specificity was about 90%. Compared with surgical needle localization under x-ray guidance, core needle biopsy resulted in fewer surgical procedures for definitive treatment, with a higher likelihood of clear surgical margins at the initial excision. 
Ductal carcinoma in situ (DCIS) is a noninvasive condition that can evolve to invasive cancer, with variable frequency and time course.  Some authors include DCIS with invasive breast cancer statistics, but others argue that the term be replaced by ductal intraepithelial neoplasia, similar to the terminology used for cervical and prostate precursor lesions, and that breast cancer statistics exclude DCIS.
DCIS is most often diagnosed by mammography. In the United States, only 4,900 women were diagnosed with DCIS in 1983, compared with approximately 63,410 women who are expected to be diagnosed in 2017, when mammographic screening has been widely adopted.    The Canadian National Breast Screening Study-2 of women aged 50 to 59 years found a fourfold increase in DCIS cases in women screened by clinical breast examination (CBE) plus mammography compared with those screened by CBE alone, with no difference in breast cancer mortality.  (Refer to the PDQ summary on Breast Cancer Treatment for more information.)
The natural history of DCIS is poorly understood because nearly all DCIS cases are treated. A single retrospective review of 11,760 breast biopsies performed between 1952 and 1968 identified 28 cases of DCIS,   which were detected by physical examination, biopsied without resection, and then followed for 30 years. Nine women developed invasive breast cancer and four women died of the disease. These findings are interesting but probably not relevant to women with screen-detected DCIS in an era of improved cancer care.
Development of breast cancer after treatment of DCIS depends on the characteristics of the lesion but also on the delivered treatment. One large randomized trial found that 13.4% of women treated by lumpectomy alone developed ipsilateral invasive breast cancer within 90 months, compared with 3.9% of those treated by lumpectomy and radiation.  The best evidence indicates that most DCIS lesions will not evolve to invasive cancer and that those that do can still usually be managed successfully, even after that transition. Thus, the detection and treatment of nonpalpable DCIS often represents overdiagnosis and overtreatment.
Among women diagnosed with (and treated for) DCIS between 1984 and 1989, 1.9% died of breast cancer within 10 years,  which was a lower mortality rate than for the age-matched population at large. This favorable outcome may reflect the benign nature of the condition, the benefits of treatment, or the volunteer effect (women undergoing breast cancer screening are generally healthier than those who do not).
Attempts to define low-risk DCIS cases that can be managed with fewer therapies are important. One such effort analyzed a series of 706 DCIS patients who were monitored to develop the University of Southern California/Van Nuys Prognostic Scoring Index, which defines the risk of recurrent DCIS and invasive cancer among women with DCIS based on age, margin width, tumor size, and grade.  The low-risk group, comprising one-third of the cases, experienced only 1% of DCIS recurrences and no invasive cancers, independent of the use of postoperative radiation therapy. The moderate- and high-risk groups had higher recurrence rates, and they benefited from postlumpectomy radiation therapy. Overall, approximately 1% died of breast cancer. In a separate study, adjuvant tamoxifen therapy was shown to reduce the incidence of invasive breast cancer. 
The prevalence of atypia is low, ranging from 4% to 10% of breast biopsies.   However, the large number of breast biopsies performed each year, estimated to be 1.6 million in the United States alone,   translates this outcome to a large number of women. Atypia is a diagnostic classification with considerable variation among practicing pathologists. One study of 115 U.S. pathologists reported that study pathologists agreed with an expert consensus diagnosis of atypia only 48% of the time. 
The pathologist’s diagnosis of breast tissue ranges from benign without atypia, to atypia, to DCIS, to invasive breast cancer. The incidence of atypia and DCIS breast lesions has increased over the past three decades as a result of widespread mammography screening, although atypia is generally mammographically occult.   Misclassification of breast lesions may contribute to either overtreatment or undertreatment of lesions identified during breast screening. Studies have demonstrated challenges encountered by pathologists on agreeing about the diagnoses of breast tissue, especially atypia and DCIS.      
The largest study on this topic, the B-Path study, included 115 practicing U.S. pathologists who interpreted a single-breast biopsy slide per case and compared their interpretations with an expert consensus-derived reference diagnosis.  While the overall agreement between the individual pathologists’ interpretations and the expert reference diagnoses was highest for invasive carcinoma, markedly lower levels of concordance were noted for DCIS and atypia.  As the B-Path study included higher proportions of cases of atypia and DCIS than typically seen in clinical practice, the authors expanded their work by applying Bayes’ theorem to estimate how diagnostic variability affects accuracy from the perspective of a U.S. woman aged 50 to 59 years having a breast biopsy.  At the U.S. population level, it is estimated that 92.3% (confidence interval [CI], 91.4%–93.1%) of breast biopsy diagnoses would be verified by an expert reference consensus diagnosis, with 4.6% (CI, 3.9%–5.3%) of initial breast biopsies estimated to be overinterpreted and 3.2% (CI, 2.7%–3.6%) underinterpreted. Figure 1 shows the predicted outcomes per 100 breast biopsies, overall and by diagnostic category.Figure 1. Predicted outcomes per 100 breast biopsies, overall and by diagnostic category. From Annals of Internal Medicine, Elmore JG, Nelson HD, Pepe MS, Longton GM, Tosteson AN, Geller B, Onega T, Carney PA, Jackson SL, Allison KH, Weaver DL, Variability in Pathologists' Interpretations of Individual Breast Biopsy Slides: A Population Perspective, Volume 164, Issue 10, Pages 649–55, Copyright © 2016 American College of Physicians. All Rights Reserved. Reprinted with the permission of American College of Physicians, Inc.
To address the high rates of discordance in breast tissue diagnosis, laboratory policies that require second opinions are becoming more common. A national survey of 252 breast pathologists participating in the B-Path study found that 65% of respondents reported having a laboratory policy that requires second opinions for all cases initially diagnosed as invasive disease. Additionally, 56% of respondents reported policies that require second opinions for initial diagnoses of DCIS, while 36% of respondents reported mandatory second opinion policies for cases initially diagnosed as atypical ductal hyperplasia.  In this same survey, pathologists overwhelmingly agreed that second opinions improved diagnostic accuracy (96%).
A simulation study that used B-Path study data evaluated 12 strategies for obtaining second opinions to improve interpretation of breast histopathology.  Accuracy improved significantly with all second opinion strategies, except for the strategy limiting second opinions only to cases of invasive cancer. Accuracy improved regardless of the pathologists’ confidence in their diagnosis or their level of experience. While the second opinions improved accuracy, they did not completely eliminate diagnostic variability, especially in the challenging case of breast atypia.
Numerous uncontrolled trials and retrospective series have documented the ability of mammography to diagnose small, early-stage breast cancers, which have a favorable clinical course.  Although several trials also show better cancer-related survival in screened women versus nonscreened women, a number of important biases may explain that finding:
Because the extent of these biases is never clear in any particular study, the gold standard used by most groups to assess the benefits of screening is the randomized controlled trial (RCT) with cause-specific mortality as the endpoint. RCTs with cause-specific mortality as the endpoint avoid lead/length/overdiagnosis bias. (Refer to the PDQ summary on Cancer Screening Overview for more information.)
Performance benchmarks for screening mammography in the United States are described on the Breast Cancer Surveillance Consortium (BCSC) website.
The sensitivity of mammography is the percentage of breast cancers detected in a given population, when breast cancer is present. Sensitivity depends on tumor size, conspicuity, hormone sensitivity, breast tissue density, patient age, timing within the menstrual cycle, overall image quality, and interpretive skill of the radiologist. Overall sensitivity is approximately 79% but is lower in younger women and in those with dense breast tissue (see the BCSC website).    According to the Physician's Insurance Association of America (PIAA), delay in diagnosis of breast cancer and false-negative mammogram interpretations are a common cause of medical malpractice litigation. PIAA data from 2001 through 2011 note that breast cancer claims had the largest total indemnity payment and that diagnostic errors ranked as the top alleged error associated with breast cancer. 
The specificity of mammography is the likelihood of the test being normal when cancer is absent, whereas the false-positive rate is the likelihood of the test being abnormal when cancer is absent. If specificity is low, many false-positive examinations result in unnecessary follow-up examinations and procedures. (Refer to the subsection on Harms in the Screening With Mammography section of the Overview section of this summary for more information.)
Interval cancers are cancers that are diagnosed in the interval after a normal screening examination and before the subsequent screen. Some of these cancers were present at the time of mammography (false negatives), and others grew rapidly in the interval between mammography and detection. As a general rule, interval cancers have characteristics of rapid growth   and are frequently of advanced stage at the time of discovery/diagnosis. 
A study that used data from the Nova Scotia Breast Screening Program identified 342 interval breast cancers in the context of 302,234 screening exams. The authors classified the 342 into the categories of missed cancers (false-negative on the previous screen) and true interval cancers (cancers not detectable at the previous screening exam). For women aged 40 to 49 years, the annual rate of missed cancers per 1,000 women screened was 0.45; the rate for true interval cancers was 0.93. For women aged 50 to 69 years, the rate of missed cancers per 1,000 women screened was 0.90; the rate for true interval cancers was 3.15. 
One study of 576 women with interval cancers reported that interval cancers are more prevalent in women aged 40 to 49 years. Interval cancers appearing within 12 months of a negative screening mammogram appear to be related to decreased mammographic sensitivity, attributable to greater breast density in 68% of cases. Those appearing within a 24-month interval appear to be related both to decreased mammographic sensitivity caused by greater breast density in 37.6% of cases and to rapid tumor growth in 30.6% of cases. 
Another study that compared the characteristics of 279 screen-detected cancers with those of 150 interval cancers found that interval cancers were much more likely to occur in women younger than 50 years and to be of mucinous or lobular histology; or to have high histologic grade, high proliferative activity, or relatively benign features mammographically; and/or to lack calcifications. Screen-detected cancers were more likely to have tubular histology; to be smaller, low stage, and hormone sensitive; and to have a major component of ductal carcinoma in situ. 
Mammography utilizes ionizing radiation to image breast tissue. The examination is performed by compressing the breast firmly between two plates. Such compression spreads out overlapping tissues and reduces the amount of radiation needed to image the breast. For routine screening in the United States, examinations are taken in both mediolateral oblique and craniocaudal projections. Both views should include breast tissue from the nipple to the pectoral muscle. Radiation exposure is 4 to 24 mSv per standard two-view screening examination. Two-view examinations are associated with a lower recall rate than are single-view examinations because they reduce concern about abnormalities due to superimposition of normal breast structures.  Two-view exams are also associated with lower interval cancer rates than are single-view exams. 
Under the Mammography Quality Standards Act (MQSA) enacted by Congress in 1992, all U.S. facilities that perform mammography must be certified by the U.S. Food and Drug Administration (FDA) to ensure the use of standardized training for personnel and a standardized mammography technique utilizing a low radiation dose.  (Refer to the FDA's web page on Mammography Facility Surveys, Mammography Equipment Evaluations, and Medical Physicist Qualification Requirement under MQSA.) The 1998 MQSA Reauthorization Act requires that patients receive a written lay-language summary of mammography results.
The following Breast Imaging Reporting and Data System (BI-RADS) categories are used for reporting mammographic results: 
Most screening mammograms are typically interpreted as negative or benign (BI-RADS 1 or 2, respectively), with about 10% of women in the United States being asked to return for additional evaluation.  The percentage of women asked to return for additional evaluation varies not only by the underlying characteristics of each woman but also by mammography facility and radiologist. Extensive literature shows increasing rates of malignancy with BI-RADS assessment categories, with less than 1% risk for diagnosis of cancer within the next year after a BI-RADS 1 or 2 assessment, 2% risk for diagnosis of cancer within the next year after a BI-RADS 3 assessment, and 95% risk for diagnosis of cancer within the next year after a BI-RADS 5 assessment. A BI-RADS 4 can optionally be subdivided into categories 4a, low suspicion (>2% to 10% risk of malignancy); 4b, moderate suspicion (>10% to 50% risk of malignancy); and 4c, high suspicion (>50% to <95% risk of malignancy). 
Digital mammography is more expensive than screen-film mammography (SFM) but is more amenable to data storage and sharing. The net impact of screening with digital mammography versus film mammography, in terms of health outcomes and the net difference in rates of overdiagnosis, is unknown. Performance of both SFM and digital mammography for measures such as cancer detection rate, sensitivity, specificity, and positive predictive value (PPV) have been compared directly in several trials, and the trials yielded similar results.
A large cohort of women (n = 42,760) who underwent both digital and film mammography was evaluated at 33 U.S. centers in the Digital Mammographic Imaging Screening Trial (DMIST). No differences in breast cancer detection were observed (area under the curve [AUC] of 0.78 +/- 0.02 for digital and AUC of 0.74 +/- 0.02 for film; P = .18). Digital mammography was better at cancer detection in women younger than 50 years (AUC of 0.84 +/- 0.03 for digital; AUC of 0.69 +/- 0.05 for film; P = .002). 
A second DMIST report found that film mammography had a higher AUC in women aged 65 years and older (AUC 0.88 for film; AUC 0.70 for digital; P = .025); however, this finding was not statistically significant when multiple comparisons were considered. 
In a large U.S. cohort study,  sensitivity for women younger than 50 years was 75.7% (95% confidence interval [CI], 71.7–79.3) for film mammography and 82.4% (95% CI, 76.3–87.5) for digital mammography; specificity was 89.7% (95% CI, 89.6–89.8) for film mammography and 88.0% (95% CI, 88.2–87.8) for digital mammography. A comparison of the findings from 1.5 million digital mammography screens and 4.5 million screen-film mammogram (SFM) screens that were performed in the Netherlands from 2004 to 2010 indicated higher recall and detection rates for the digital mammography screens.  Among radiologists who read both digital and SFM exams (n = 1.5 million), the recall rates were 2.0% for digital mammography (95% CI, 2.0–2.1) versus 1.6% for SFM (95% CI, 1.6–1.6); the detection rates were 5.9 per 1,000 (95% CI, 5.7–6.0) for digital mammography and 5.1 per 1,000 (95% CI, 5.0–5.2) for SFM. The PPV was statistically significantly lower in the digital mammography group (PPV, 31.2%; 95% CI, 30.6–31.7) than in the screen-film group (PPV, 34.4%; 95% CI, 33.8%–35.0%). For women aged 49 to 54 years, the recall rates for digital screens versus film screens were 2.7% versus 2.0%, respectively; the detection rates were 5.1 versus 4.0 per 1,000 screens, respectively; and the PPV was 21.4% and 22.1%, respectively. For women aged 55 to 74 years, the recall rates for digital screens versus film screens were 1.7% versus 1.4%, respectively; the detection rates were 6.2 versus 5.6 per 1,000 screens, respectively; and the PPV was 35.7% versus 40.1%, respectively. 
A meta-analysis  of 10 studies, including the DMIST   and the aforementioned U.S. cohort study,  compared digital mammography with film mammography in 82,573 women who underwent both types of the exam. In a random-effects model, there was no statistically significant difference in cancer detection between the two types of mammography (AUC of 0.92 for film and AUC of 0.91 for digital). For women younger than 50 years, all studies found that sensitivity was higher for digital mammography but that specificity was either the same or higher for film mammography. The meta-analysis found no other differences based on age.
Computed radiography (CR) utilizes a cassette-based removable detector and external reading device to generate a digital image. A large concurrent cohort study compared 254,758 full-field digital mammography (FFDM) screens with 487,334 SFM screens and 74,190 CR screens.  Again, the cancer detection rate was not different between FFDM (4.9 per 1,000) and SFM (4.8 per 1,000), although the recall rate was higher for FFDM. Importantly, cancer detection was lower for CR at 3.4 per 1,000, adjusted odds ratio (OR) 0.79 (95% CI, 0.68–0.93). Two prior studies of noncontemporaneous cohorts showed no difference between CR and SFM or higher cancer-detection rate from CR.  
CAD systems are designed to help radiologists read mammograms by highlighting suspicious regions such as clustered microcalcifications and masses.  Generally, CAD systems increase sensitivity and decrease specificity  and increase detection of ductal carcinoma in situ (DCIS).  Several CAD systems are in use. One large population-based study, comparing recall rates and breast cancer detection rates before and after the introduction of CAD systems found no change in either rate.   Another large study noted an increase in recall rate and increased DCIS detection but no improvement in invasive cancer detection rate.  
A study designed to address the limitation of previous studies by using a large database and digital (rather than film screen) mammography in women aged 40 to 89 years (rather than primarily older women) found no evidence that CAD improves screening mammography performance for four outcomes: sensitivity, specificity, screen-detected cancers (DCIS and invasive), and detection of interval cancers. In this study, CAD did detect higher rates of DCIS. 
Using a Surveillance, Epidemiology, and End Results–Medicare linked database, the use of new screening mammography modalities by more than 270,000 women aged 65 years and older in two time periods, 2001 to 2002 and 2008 to 2009, was examined. Digital mammography increased from 2% to 30%, CAD increased from 3% to 33%, and spending increased from $660 million to $962 million. There was no difference in detection rates of early-stage (DCIS or stage I) or late-stage (stage IV) tumors. 
Tomosynthesis, or 3-dimensional (3-D) mammography, is similar to standard 2-D mammography in how the examination is performed: the breasts are compressed in the same positions as for mammography, and the examination uses x-rays to create the image. In tomosynthesis, multiple short-exposure x-rays are obtained at different angles as the x-ray tube moves over the breast. This process takes a few seconds longer than a standard mammogram. Individual images are then reconstructed into a series of thin slices that can be viewed individually or like a movie. Cancers and other abnormalities are detected because of differences in density and shape compared with surrounding tissue, with some cancers and other findings causing architectural distortion. Overlapping tissues can be more easily recognized accurately as normal with tomosynthesis, and some cancers are better seen than on standard mammography. In some centers, tomosynthesis-guided biopsy may be available because some cancers seen only on tomosynthesis cannot be found with ultrasound.
The combination of 2-D and 3-D mammography has been reported to be more accurate than 2-D mammography alone, with respect to both improved detection of breast cancer (averaging added yield of 1.3/1,000, similar to CAD) and, importantly, reduction in recall rates. On average, 1.8% fewer women will be recalled for extra testing when tomosynthesis is performed in addition to standard 2-D digital mammography for screening. More than 80% of the cancers detected only with tomosynthesis are invasive and node negative.   In particular, tomosynthesis depicts architectural distortion better than standard digital mammography; in one series of 26 cases of architectural distortion in women who had both 2-D and 3-D mammography,  19 (73%) were seen only on tomosynthesis, and 4 (21%) of those 19 were malignant.
When tomosynthesis is performed in combination with 2-D mammography, the resulting radiation exposure to the patient is essentially doubled. This is expected to result in another 1.3 fatal cancers per 100,000 women screened at age 40 years (fewer with increasing age), compared with another 130 cancers detected (see Table 1).
The performance of tomosynthesis in isolation (with synthetic 2-D mammograms created) has not been adequately validated in practice, with only one reader study and one prospective clinical trial undertaken to date.  The effect of annual tomosynthesis on breast cancer mortality has not been tested in a prospective clinical trial.
Tomosynthesis in the diagnostic setting (specifically, evaluation of mammographic abnormalities) has been shown to be at least as effective as spot compression views for workup of noncalcified abnormalities, including asymmetries and distortions.   Tomosynthesis is not worse than standard 2-D mammography at allowing suspicious microcalcifications to be identified,  but magnification views are typically still needed to characterize suspicious calcifications.
The use of tomosynthesis in both screening and diagnosis may decrease the need for ultrasound and other additional testing (see Table 1). At this time, there are no data on the association of tomosynthesis and overall mortality reduction.
|CDR = cancer detection rate; DBT = digital breast tomosynthesis, also known as 3-D mammography; FFDM = full field digital mammography, also known as standard 2-D mammography; no. = number.|
|Study Design||Prospective; each patient had both exams||Prospective; each patient had both exams||Historical control with 2-D only||Historical control with 2-D only||Historical control with 2-D only|
|No. of DBT||12,631||7,292||9,499||173,663||23,149||226,234|
|No. of FFDM||12,631||7,292||13,856||281,187||54,684||365,293|
|CDR FFDM (2-D) Alone||6.1/1,000||5.3/1,000||4.04/1,000||4.2/1,000||4.9/1,000|
|Difference (No. of Women)||+1.9/1,000 (24)||+2.7/1,000 (20)||+1.3/1,000 (12)||+1.2/1,000 (208)||+1.4/1,000 (32)||+1.3/1,000 (296)|
|P-Value (Detection Rate)||.001||< .0001||0.18||< .001||.035|
|Absolute Recall Rate Difference||-0.8%||-2.0%||-3.2%||-1.6%||-2.6%||-1.8%|
|P-Value (Recall Rate)||< .001||< .0001||< .001||< .001||> .0001|
The primary role of ultrasound is the diagnostic evaluation of palpable or mammographically identified masses, rather than serving as a primary screening modality. A review of the literature and expert opinion by the European Group for Breast Cancer Screening concluded that “there is little evidence to support the use of ultrasound in population breast cancer screening at any age.”  In the setting of normal mammography and ultrasonography, less than 3% of women who have a lump will ultimately be found to have breast cancer.    
Breast magnetic resonance imaging (MRI) may be used in women for diagnostic evaluation, including evaluating the integrity of silicone breast implants, assessing palpable masses following surgery or radiation therapy, detecting mammographically and sonographically occult breast cancer in patients with axillary nodal metastasis, and preoperative planning for some patients with known breast cancer. There is no ionizing radiation exposure with this procedure. MRI has been promoted as a screening test for breast cancer among women at elevated risk of breast cancer based on BRCA1/2 mutation carriers, a strong family history of breast cancer, or several genetic syndromes such as Li-Fraumeni or Cowden disease.    Breast MRI is more sensitive but less specific than screening mammography   and is more expensive.
Using infrared imaging techniques, thermography of the breast identifies temperature changes in the skin as an indicator of an underlying tumor, displaying these changes in color patterns. Thermographic devices have been approved by the FDA under the 510(k) process, which does not require evidence of clinical effectiveness. There have been no randomized trials of thermography to evaluate the impact on breast cancer mortality or the ability to detect breast cancer. Small cohort studies do not suggest any additional benefit for the use of thermography as an adjunct modality for breast cancer screening.  
Randomized controlled trials (RCTs), with participation by nearly half-a-million women from four countries, examined the breast cancer mortality rates of women who were offered regular screening. One trial, the Canadian National Breast Screening Study (NBSS)-2, compared mammogram plus clinical breast examination (CBE) with CBE alone; the other eight trials compared screening mammogram with or without CBE to a control consisting of usual care.
The trials differed in design, recruitment of participants, interventions (both screening and treatment), management of the control group, compliance with assignment to screening and control groups, and analysis of outcomes. Some trials used individual randomization, while others used cluster randomization in which cohorts were identified and then offered screening; one trial used nonrandomized allocation by day of birth in any given month. Cluster randomization sometimes led to imbalances between the intervention and control groups. Age differences have been identified in several trials, although the differences were probably too small to have a major effect on the trial outcome.  In the Edinburgh Trial, socioeconomic status, which correlates with the risk of breast cancer mortality, differed markedly between the intervention and control groups, so it is difficult, if not impossible, to interpret the results.
Breast cancer mortality is the major outcome parameter for each of these trials, so the methods used to determine cause of death are critically important. Efforts to reduce bias in the attribution of mortality cause have been made, including the use of a blinded monitoring committee (New York) and a linkage to independent data sources, such as national mortality registries (Swedish trials). Unfortunately, these attempts could not ensure a lack of knowledge of women’s assignments to screening or control arms. Evidence of possible misclassification of breast cancer deaths in the Two-County Trial with possible bias in favor of screening has been analyzed. 
There were also differences in the methodology used to analyze the results of these trials. Four of the five Swedish trials were designed to include a single screening mammogram in the control group, timed to correspond with the end of the series of screening mammograms in the study group. The initial analysis of these trials used an evaluation analysis, tallying only the breast cancer deaths that occurred in women whose cancer was discovered at or before the last study mammogram. In some of the trials a delay occurred in the performance of the end-of-study mammogram, resulting in more time for members of the control group to develop or be diagnosed with breast cancer. Other trials used a follow-up analysis, which counts all deaths attributed to breast cancer, regardless of the time of diagnosis. This type of analysis was used in a meta-analysis of four of the five Swedish trials in response to concerns about the evaluation analyses. 
The accessibility of the data for international audits and verification also varies, with formal audit having been undertaken only in the Canadian trials. Other trials have been audited to varying degrees, usually with less rigor. 
All of these studies are designed to study breast cancer mortality rather than all-cause mortality because of the infrequency of breast cancer deaths relative to the total number of deaths. When all-cause mortality in these trials was examined retrospectively, only the Edinburgh Trial showed a significant difference, which could be attributed to socioeconomic differences. The meta-analysis (follow-up methods) of the four Swedish trials also showed a small but significant improvement of all-cause mortality.
Refer to the Appendix of Randomized Controlled Trials section of this summary for a detailed description of the trials.
Screening for breast cancer does not affect overall mortality, and the absolute benefit for breast cancer mortality is small.
A way to view the potential benefit of breast cancer screening is to estimate the number of lives extended because of early breast cancer detection.   One author estimated the outcomes of 10,000 women aged 50 to 70 years who undergo a single screen.  Mammograms will be normal (true negatives and false negatives) in 9,500 women. Of the 500 abnormal screens, 466 to 479 will be false positives, and 100 to 200 of these women will undergo invasive procedures. The remaining 21 to 34 abnormal screens will be true positives, indicating breast cancer. Some of these women will die of breast cancer in spite of mammographic detection and optimal therapy, and some may live long enough to die of other causes even if the cancer had not been screen detected. The number of extended lives attributable to mammographic detection is between two and six. Another expression of this analysis is that one life may be extended per 1,700 to 5,000 women screened and followed for 15 years. The same analysis for 10,000 women aged 40 to 49 years, assuming the same 500 abnormal examinations, results in an estimate that 488 of these will be false positives, and 12 will be breast cancer. Of these 12, there will probably be only one or two lives extended. Thus, for women aged 40 to 49 years, it is estimated that one or two lives may be extended per 5,000 to 10,000 mammograms.
While the numbers discussed above are from a single mammography exam, women undergo screening throughout their lifetimes, which can include 20 to 30 years of screening activity. A meta-analysis of RCTs conducted for the U.S. Preventive Services Task Force in 2009 (including the AGE Trial) found that the number needed to invite to screen for 10 years to avoid or delay one death from breast cancer was 1,904 for women in their 40s, 1,339 for women in their 50s, and 377 for women in their 60s.  A 2009 combined analysis by six Cancer Intervention and Surveillance Modeling Network modeling groups found that screening every 2 years maintained an average of 81% of the benefit of annual screening with almost one-half of the false-positive results. Screening biennially from age 50 to 69 years achieved a median 16.5% reduction in breast cancer deaths versus no screening. Initiating biennial screening at age 40 years (vs. age 50 years) reduced breast cancer mortality by an additional 3%, consumed more resources, and yielded more false-positive results. 
There are several problems with using these RCTs alone to estimate the magnitude of breast cancer mortality reduction from a long-term program of breast cancer screening in the present. These problems include the following:
Although there is no ideal answer to these problems, well-conducted cohort and ecologic studies, in addition to RCTs, help to account for the estimated magnitude of breast cancer mortality reduction resulting from current day screening.  Additionally, the harms of screening, especially false-positive results, are greater with a first screen and decrease with subsequent screens, for which there are previous images for comparison.
Although the RCTs of screening have addressed the issue of screening efficacy (i.e., the extent to which screening reduces breast cancer mortality under the ideal conditions of an RCT), they do not provide information about the effectiveness of screening (i.e., the extent to which screening is reducing breast cancer mortality in the U.S. population). Studies that provide information about this issue include nonrandomized controlled studies of screened versus nonscreened populations, case-control studies of screening in real communities, and modeling studies that examine the impact of screening on large populations. An important issue in all of these studies is the extent to which they can control for additional effects on breast cancer mortality such as improved treatment and heightened awareness of breast cancer in the community.
Three population-based, observational studies from Sweden compared breast cancer mortality in the presence and absence of screening mammography programs. One study compared two adjacent time periods in 7 of the 25 counties in Sweden and concluded a statistically significant breast cancer mortality reduction of 18% to 32% attributable to screening.  The most important bias in this study is that the advent of screening in these counties occurred during a period in which dramatic improvements in the effectiveness of adjuvant breast cancer therapy were being made, changes which were not addressed by the study authors. The second study considered an 11-year period that compared seven counties with screening programs with five counties without them.  There was a trend in favor of screening, but again, the authors did not consider the effect of adjuvant therapy or differences in geography (urban vs. rural) that might affect treatment practices.
In part to account for the effects of treatment, the third study was a detailed analysis by county and concluded little impact of screening.  These authors made the assumption that the annual decrease in mortality observed during the prescreening period would carry into the postscreening period, and any screening effect would result in an incremental decrease in mortality. Although no such incremental decrease in breast cancer mortality was observed after the introduction of screening, their assumption makes their conclusion weak. Comparisons across counties showed similar reductions in decreased breast cancer mortality regardless of when the counties’ screening programs were initiated; however, the authors carried out no formal cross-county analyses.
The interpretation of this ecologic analysis is limited by the following factors:
In Nijmegen, the Netherlands, where a population-based screening program was undertaken in 1975, a case-cohort study showed that screened women have decreased mortality (OR, 0.48).  However, a subsequent study comparing Nijmegen breast cancer mortality rates with neighboring Arnhem in the Netherlands, which had no screening program, showed no difference in breast cancer mortality. 
A community-based case-control study of screening as practiced in excellent U.S. health care systems between 1983 and 1998 found no association between previous screening and reduced breast cancer mortality. Mammography screening rates, however, were generally low.  The association among women at increased risk because of a family history of breast cancer or a previous breast biopsy (OR, 0.74; 95% CI, 0.50–1.03) was stronger than that among women at average risk (OR, 0.96; 95% CI, 0.80–1.14), but the difference was not statistically significant (P = .17). 
A well-conducted ecologic study compared three pairs of neighboring European countries, matched on similarity in health care systems and population structure, one of which had started a national screening program some years earlier than the others. The investigators found that each country had experienced a reduction in breast cancer mortality, with no difference between matched pairs that could be attributed to screening. The authors suggested that improvements in breast cancer treatment and/or health care organizations were more likely responsible for the reduction in mortality than was screening. 
A systematic review of ecologic and large cohort studies published through March 2011 compared breast cancer mortality in large populations of women aged 50 to 69 years who started breast cancer screening at different times. Seventeen studies met inclusion criteria. All studies had methodological problems, including control group dissimilarities, insufficient adjustment for differences between areas in breast cancer risk and breast cancer treatment, and problems with similar measurement of breast cancer mortality between compared areas. There was great variation in results among the studies, with four studies finding a relative reduction in breast cancer mortality of 33% or more (with wide CIs) and five studies finding no reduction in breast cancer mortality. Because only a part of the overall reduction in breast cancer mortality could possibly be attributed to screening, the review concluded that any relative reduction in breast cancer mortality resulting from screening would likely be no more than 10%, less than predicted by the RCTs. 
A U.S. ecologic analysis conducted between 1976 and 2008 examined the incidence of early-stage versus late-stage breast cancer for women aged 40 years and older. To find a screening effect, the authors compared the magnitude of increase in early-stage cancer with the magnitude of an expected decrease in late-stage cancer. During the study, the absolute increase in the incidence of early-stage cancer was 122 cancers per 100,000 women, while the absolute decrease in late-stage cancers was 8 cases per 100,000 women. After adjusting for changes in incidence resulting from hormone therapy and other undefined causes, the authors concluded that the screening effect on breast cancer mortality reduction (28% during this period) was small and that overdiagnosis of breast cancer was likely between 22% and 31% of all diagnosed breast cancers. Most of the reduction in breast cancer mortality, the authors concluded, was probably because of improved treatment rather than screening. To make these adjustments, the authors made uncertain assumptions about the effects of other factors on incidence, and made no mention of the effects of changing treatment over time. Ecologic studies are difficult to interpret because of this type of potential uncontrolled confounding, as well as these types of unfair comparisons. However, this study largely agrees with some similar analyses from other countries (see studies discussed above).  A major limitation of this and other ecologic studies is the failure to account for actual exposure to screening. Most late-stage breast cancer occurs in women not exposed to screening.
An analytic approach was used to approximate the magnitude of overdiagnosis and the contributions of screening versus treatment to breast cancer mortality reduction.  The shift in the size distribution of breast cancers in the United States from 1975 to 2012, an interval that spans the period from before the introduction of mammography to after its widespread dissemination, was investigated using Surveillance, Epidemiology, and End Results (SEER) data in women aged 40 years and older under the assumption that the rate of clinically meaningful breast cancer was stable during this time. There was an indication of the potential for screening to lower mortality reflected in a declining incidence of larger (≥ 2 cm) tumors. However, reduction in breast cancer case fatality was also documented, with the change for large tumors likely primarily caused by improvements in therapy. This decline in size-specific case fatality suggested that improved treatment was responsible for about two-thirds of the reduction in breast cancer mortality.Figure 2. Screening mammography and increased incidence of invasive breast cancer. Shown are the incidences of overall invasive breast cancer and metastatic breast cancer among women 40 years of age or older at nine sites of the Surveillance, Epidemiology, and End Results (SEER) program, during the period from 1975 through 2012. From New England Journal of Medicine, Welch HG, Prorok PC, O'Malley AJ, Kramer BS, Breast-Cancer Tumor Size, Overdiagnosis, and Mammography Screening Effectiveness, Volume 375, Issue 15, Pages 1438-47, Copyright © 2016 Massachusetts Medical Society. Reprinted with permission from Massachusetts Medical Society.
A prospective cohort study of community-based screening programs in the United States found that annual compared with biennial screening mammography did not reduce the proportion of unfavorable breast cancers detected in women aged 50 to 74 years or in women aged 40 to 49 years who did not have extremely dense breasts. Women aged 40 to 49 years with extremely dense breasts did have a reduction in cancers larger than 2.0 cm (OR for biennial vs. annual screening, 2.39; 95% CI, 1.37–4.18). 
The optimal screening interval has been addressed by modelers. Modeling makes assumptions that may not be correct; however, the credibility of modeling is greater when the model produces overall results that are consistent with randomized trials overall and when the model is used to interpolate or extrapolate. For example, if a model’s output agrees with RCT outcomes for annual screening, it has greater credibility in comparing the relative effectiveness of biennial versus annual screening.
In 2000, the National Cancer Institute formed a consortium of modeling groups (Cancer Intervention and Surveillance Modeling [CISNET]) to address the relative contribution of screening and adjuvant therapy to the observed decline in breast cancer mortality in the United States.  (Refer to the Randomized controlled trials section of this summary for more information.) These models gave reductions in breast cancer mortality similar to those expected in the circumstances of the RCTs but updated to the use of modern adjuvant therapy. In 2009, CISNET modelers addressed several questions related to the harms and benefits of mammography, including comparing annual versus biennial screening.  The proportion of reduction in breast cancer mortality maintained in moving from annual to biennial screening for women aged 50 to 74 years ranged across the six models from 72% to 95%, with a median of 80%.
Several studies have shown that the method of cancer detection is a powerful predictor of patient outcome,  which is useful for prognostication and treatment decisions. All of the studies accounted for stage, nodal status, and tumor size.
A 10-year follow-up study of 1,983 Finnish women with invasive breast cancer demonstrated that the method of cancer detection is an independent prognostic variable. When controlled for age, nodal status, and tumor size, screen-detected cancers had a lower risk of relapse and better overall survival. For women whose cancers were detected outside screening, the hazard ratio (HR) for death was 1.90 (95% confidence interval [CI], 1.15–3.11), even though they were more likely to receive adjuvant systemic therapy. 
Similarly, an examination of the breast cancers found in three randomized screening trials (Health Insurance Plan, National Breast Screening Study [NBSS]-1, and NBSS-2) accounted for stage, nodal status, and tumor size and determined that patients whose cancer was found via screening have a more favorable prognosis. The relative risks for death were 1.53 (95% CI, 1.17–2.00) for interval and incident cancers, compared with screen-detected cancers; and 1.36 (95% CI, 1.10–1.68) for cancers in the control group, compared with screen-detected cancers. 
A third study compared the outcomes of 5,604 English women with screen-detected cancers to those with symptomatic breast cancers diagnosed between 1998 and 2003. After controlling for tumor size, nodal status, grade, and patient age, researchers found that the women with screen-detected cancers fared better than their symptomatic counterparts. The HR for survival of the symptomatic women was 0.79 (95% CI, 0.63–0.99).  Thus, method of cancer detection is a powerful predictor of patient outcome,  which is useful for prognostication and treatment decisions. The findings of this study are also consistent with the evidence that some screen-detected cancers are low risk and represent overdiagnosis.
Several characteristics of women being screened that are associated with the accuracy of mammography include age, breast density, whether it is the first or subsequent exam, and time since last mammogram. Younger women have lower sensitivity and higher false-positive rates on screening mammography than do older women (refer to the Breast Cancer Surveillance Consortium performance measures by age for more information).
For women of all ages, high breast density is associated with 10% to 29% lower sensitivity.  High breast density is an inherent trait, which can be familial   but also may be affected by age, endogenous  and exogenous   hormones,  selective estrogen receptor modulators such as tamoxifen,  and diet.  Hormone therapy is associated with increased breast density and is associated not only with lower sensitivity but also with an increased rate of interval cancers. 
The Million Women Study in the United Kingdom revealed three patient characteristics that were associated with decreased sensitivity and specificity of screening mammograms in women aged 50 to 64 years: use of postmenopausal hormone therapy, prior breast surgery, and body mass index below 25.  In addition, a longer interval since the last mammogram increases sensitivity, recall rate, and cancer detection rate and decreases specificity. 
Strategies have been proposed to improve mammographic sensitivity by altering diet, timing mammograms with menstrual cycles, interrupting hormone therapy before the examination, or using digital mammography machines.  Obese women have more than a 20% increased risk of having false-positive mammography results compared with underweight and normal weight women, although sensitivity is unchanged. 
Some cancers are more easily detected by mammography than other cancers are. In particular, mucinous, lobular, and rapidly growing cancers can be missed because their appearance on x-rays is similar to that of normal breast tissue.  Medullary carcinomas may be similarly missed.  Some cancers, particularly those associated with BRCA1/2 mutations, masquerade as benign tumors.  
Radiologist performance is critical to assessing mammographic interpretive performance, yet there is substantial, well-documented variability among radiologists. Factors that influence radiologists’ performance include their level of experience and the volume of mammograms they interpret.  There is often a trade-off between sensitivity and specificity, such that higher sensitivity may be associated with lower specificity. Radiologists in academic settings have a higher positive predictive value (PPV) of their recommendations to undergo biopsy than do community radiologists.  Fellowship training in breast imaging may lead to improved cancer detection, but it is associated with higher false-positive rates. 
After controlling for patient and radiologist characteristics, screening mammography interpretive performance (specificity, PPV, area under the curve [AUC]) varies by facility and is associated with facility-level characteristics. Higher interpretive accuracy of screening mammography was seen at facilities that offered screening examinations alone, included a breast imaging specialist on staff, did single as opposed to double readings, and reviewed interpretive audits two or more times each year. 
False-positive rates vary significantly between facilities performing diagnostic mammography and are higher at facilities where concern about malpractice is high.  False-positive biopsy rates are also higher at facilities that perform diagnostic mammography and serve vulnerable women (racial or ethnic minority women and women with lower educational attainment, limited household income, or rural residence) than at facilities that perform diagnostic mammography and serve nonvulnerable women.  This may reflect radiologist concerns that vulnerable populations may be less likely to follow up after an abnormality is noted when the follow-up includes a short-interval follow-up or a clinic visit. Thus, radiologists may be more likely to recommend a biopsy as opposed to short-interval follow-up.  This may also represent radiologist concerns that this population may have a slightly higher cancer prevalence.
International comparisons of screening mammography have found higher specificity in countries with more highly centralized screening systems and national quality assurance programs.   For example, one study reported that the recall rate is twice as high in the United States as it is in the United Kingdom, yet there is no difference in the rate of cancers detected. Such comparisons may be confounded by social, cultural, and economic factors. 
The likelihood of diagnosing cancer is highest with the prevalent (first) screening examination, ranging from 9 to 26 cancers per 1,000 screens, depending on the woman’s age. The likelihood decreases for follow-up examinations, ranging from 1 to 3 cancers per 1,000 screens.  The optimal interval between screening mammograms is unknown. In particular, the breast cancer mortality-focused, randomized, controlled trials used single screening intervals with little variability across the trials. A prospective United Kingdom trial randomly assigned women aged 50 to 62 years to receive mammograms annually or at the standard 3-year interval. Although the grade and node status were similar in both groups, more cancers of slightly smaller size were detected in the annual screening group, with a lead time of approximately 7 months in comparison with triennial screening. 
A large observational study found a slightly increased risk of late-stage disease at diagnosis for women in their 40s who were adhering to a 2-year versus a 1-year schedule (28% vs. 21%; odds ratio [OR], 1.35; 95% confidence interval [CI], 1.01–1.81), but no difference was seen for women in their 50s or 60s.  
A Finnish study of 14,765 women aged 40 to 49 years assigned women born in even-numbered years to annual screens and women born in odd-numbered years to triennial screens. The study was small in terms of number of deaths, with low power to discriminate breast cancer mortality between the two groups. There were 18 deaths from breast cancer in 100,738 life-years in the triennial screening group and 18 deaths from breast cancer in 88,780 life-years in the annual screening group (hazard ratio, 0.88; 95% CI, 0.59–1.27). 
Mammography screening may be effective in reducing breast cancer mortality in certain populations, but it can pose harm to women who participate. The limitations are best described as false positives (related to the specificity of the test), overdiagnosis (true positives that will not become clinically significant), false negatives (related to the sensitivity of the test), discomfort associated with the test, radiation risk, and anxiety.
|Age, y||No. of Breast Cancer Deaths Averted With Mammography Screening During the Next 15 y||No. (95% CI) With ≥1 False-Positive Result During the 10 y||No. (95% CI) With ≥1 False Positive Resulting in a Biopsy During the 10 y||No. of Breast Cancers or DCIS Diagnosed During the 10 y That Would Never Become Clinically Important (Overdiagnosis)|
|No. = number; CI = confidence interval; DCIS = ductal carcinoma in situ.|
|aAdapted from Pace and Keating. |
|bNumber of deaths averted are from Welch and Passow.  The lower bound represents breast cancer mortality reduction if the breast cancer mortality relative risk were 0.95 (based on minimal benefit from the Canadian trials   ), and the upper bound represents the breast cancer mortality reduction if the relative risk were 0.64 (based on the Swedish 2-County Trial  ).|
|cFalse positive and biopsy estimates and 95% confidence intervals are 10-year cumulative risks reported in Hubbard et al.  and Braithwaite et al. |
|dOverdiagnosed cases are calculated by Welch and Passow.  The lower bound represents overdiagnosis based on results from the Malmö trial,  whereas the upper bound represents the estimate from Bleyer and Welch. |
|eThe lower-bound estimate for overdiagnosis reported by Welch and Passow  came from the Malmö study.  The study did not enroll women younger than 50 years.|
|40||1–16||6,130 (5,940–6,310)||700 (610–780)||?–104e|
|50||3–32||6,130 (5,800–6,470)||940 (740–1,150)||30–137|
|60||5–49||4,970 (4,780–5,150)||980 (840–1,130)||64–194|
The specificity of mammography (refer to the Breast Cancer Screening Concepts section of this summary for more information) affects the number of additional interventions resulting from false-positive results. Even though breast cancer is the most common noncutaneous cancer in women, fewer than 5 per 1,000 women actually have the disease when they are screened. Therefore, even with a specificity of 90%, most abnormal mammograms are false positives.  Women with abnormal screening mammograms undergo additional mammographic imaging to magnify the area of concern, ultrasound, magnetic resonance imaging, and tissue sampling (by fine-needle aspiration, core biopsy, or excisional biopsy).
A study of breast cancer screening in 2,400 women enrolled in a health maintenance organization found that during a 10-year period, 88 cancers were diagnosed, 58 of which were identified by mammography. During that period, one-third of the women had an abnormal mammogram result that required additional testing, including 539 additional mammograms, 186 ultrasound examinations, and 188 biopsies. The cumulative biopsy rate (the rate of true positives) resulting from mammographic findings was approximately 1 in 4 (23.6%). The positive predictive value (PPV) of an abnormal screening mammogram in this population was 6.3% for women aged 40 to 49 years, 6.6% for women aged 50 to 59 years, and 7.8% for women aged 60 to 69 years.  A subsequent analysis and modeling of data from the same cohort of women, all of whom were continuously enrolled in the Harvard Pilgrim Health Care plan from July 1983 through June 1995, estimated that the risk of having at least one false-positive mammogram was 7.4% (95% confidence interval [CI], 6.4%–8.5%) at the first mammogram, 26.0% (95% CI, 24.0%–28.2%) by the fifth mammogram, and 43.1% (95% CI, 36.6%–53.6%) by the ninth mammogram.  Cumulative risk of at least one false-positive result by the ninth mammogram varied from 5% to 100%, depending on four patient variables (younger age, higher number of previous breast biopsies, family history of breast cancer, and current estrogen use) and three radiologic variables (longer time between screenings, failure to compare the current and previous mammograms, and the individual radiologist’s tendency to interpret mammograms as abnormal). Overall, the biggest risk factor for having a false-positive mammogram was the individual radiologist’s tendency to read mammograms as abnormal.
A prospective cohort study of community-based screening found that a greater proportion of women undergoing annual screening had at least one false-positive screen after 10 years than did women undergoing biennial screening, regardless of breast density. For women with scattered fibroglandular densities, the difference was 68.9% (annual) versus 46.3% (biennial) for women in their 40s. For women aged 50 to 74 years, the difference for this density group was 49.8% (annual) versus 30.7% (biennial). 
By reviewing Medicare claims following mammographic screening in 23,172 women older than 65 years, one study  found that, per 1,000 women, 85 had follow-up testing, 23 had biopsies, and 7 had cancer. Thus, the PPV for an abnormal mammogram was 8%. For women older than 70 years, the PPV was 14%.
An audit of mammograms performed in 1998 at a single institution revealed that 14.7% of examinations resulted in a recommendation for additional testing (Breast Imaging Reporting and Data System category 0), 1.8% resulted in a recommendation for biopsy (categories 4 and 5), and 5.7% resulted in a recommendation for short-term interval mammography (category 3). Cancer was diagnosed in 0.5% of the cases referred for additional testing. 
As shown in Table 2, the estimated number of women out of 10,000 who undergo annual screening mammography during a 10-year period with at least one false-positive test result is 6,130 for women aged 40 to 50 years and 4,970 for women aged 60 years. The number of women with a false-positive test that results in a biopsy is estimated to range from 700 to 980, depending on age. 
Overdiagnosis occurs when screening procedures detect cancers that would never become clinically significant. This is important not only for invasive cancers, but especially for ductal carcinoma in situ (DCIS), whose untreated natural history is unknown. Because nearly all cases of breast cancer and DCIS will be treated, women with clinically insignificant cancers will suffer treatment-related side effects unnecessarily.    
One approach to understanding overdiagnosis is to examine the prevalence of occult cancer in women who died of noncancer causes. In an overview of seven autopsy studies, the median prevalence of occult invasive breast cancer was 1.3% (range, 0%–1.8%) and of DCIS was 8.9% (range, 0%–14.7%).  
Overdiagnosis can be indirectly measured by comparing breast cancer incidence in screened populations with breast cancer incidence in unscreened populations. These comparisons can be further complicated by differences in the populations, such as time, geography, health behaviors, and hormone usage. The calculations of overdiagnosis can vary in the adjustment for lead-time bias.   An overview of 29 studies found calculated rates of overdiagnosis of 0% to 54%, with rates from randomized studies between 11% and 22%.  In Denmark, where screened and unscreened populations existed concurrently, the rate of overdiagnosis of invasive cancer was calculated to be 14% for the screened population and 39% for the unscreened population, using two different methodologies. If DCIS cases were included, the overdiagnosis rates were 24% to 48%. 
Theoretically, in a given population, the detection of more breast cancers at an early stage should result in a subsequent reduction in the incidence of advanced-stage cancers. Unfortunately, in the populations that have been studied, this has not occurred. Thus, the detection of more early-stage cancers through screening probably represents overdiagnosis.      
A cohort study in Norway compared the increase in cancer incidence in women eligible for screening based on age after the introduction of screening within the respective counties, with the cancer incidence in younger women not eligible for screening. Eligible women experienced a 60% increase in incidence of localized cancers (relative risk [RR], 1.60; 95% CI, 1.42–1.79), while the incidence of advanced cancers remained similar in the two groups (RR, 1.08; 95% CI, 0.86–1.35). 
A population study that compared different counties in the United States showed that higher rates of screening mammography use were associated with higher rates of breast cancer diagnoses, yet no corresponding decrease in 10-year breast cancer mortality was seen.  The strengths of this study include its very large size (16 million women) and the strength and consistency of correlation observed across counties. The limitations of this study include the inherent limitations of ecological studies (ecological studies relate the frequency with which an exposure or intervention, i.e., screening mammography, and an outcome, i.e., cancer diagnosis or mortality, occur in the same geographic area or care setting) and the three following design features, which could cause measurements to be unreliable: 
An analytic approach was used to approximate the magnitude of overdiagnosis and the contributions of screening versus treatment to breast cancer mortality reduction.  The shift in the size distribution of breast cancers in the United States from 1975 to 2012, an interval that spans the period from before the introduction of mammography to after its widespread dissemination, was investigated using Surveillance, Epidemiology, and End Results (SEER) data in women aged 40 years and older under the assumption that the rate of clinically meaningful breast cancer was stable during this period. When women diagnosed from 1975 to 1979 were compared with women diagnosed from 2008 to 2012, the incidence of large tumors (≥ 2 cm) decreased by 30 cases per 100,000 women, whereas the incidence of invasive tumors smaller than 2 cm and ductal carcinoma in situ (DCIS) increased by 162 cases per 100,000 women, suggesting overdiagnosis of 132 cases per 100,000 women.
Estimates of the extent of overdiagnosis noted in the Canadian National Breast Screening Study, a randomized clinical trial, have been reported. At the end of the five screening rounds, an excess of 142 invasive breast cancer cases was diagnosed in the mammography arm, compared with the control arm.  At 15 years, the excess number of cancer cases in the mammography arm versus the control arm was 106; this represents an overdiagnosis rate of 22% for the 484 screen-detected invasive cancers. 
Table 2 shows the estimated number of women with breast cancers or DCIS diagnosed during a 10-year period of screening 10,000 women that would never become clinically important (overdiagnosis). There was no overdiagnosis in the Health Insurance Plan study, which used old-technology mammography and clinical breast examination. Overdiagnosis has become more prominent in the era of improved-technology mammography. However, the benefits of newer-technology mammography compared with older-technology mammography in regard to reduced mortality have not been demonstrated. 
The sensitivity of mammography (refer to the Breast Cancer Screening Concepts section of this summary for more information) ranges from 70% to 90%, depending on a woman’s age and the density of her breasts, which is affected by her genetic predisposition, hormone status, and diet. Assuming an average sensitivity of 80%, mammograms will miss approximately 20% of the breast cancers that are present at the time of screening (false negatives). Many of these missed cancers are high risk, with adverse biologic characteristics (refer to the Interval cancers section in the Breast Cancer Screening Concepts section of this summary for more information). If a normal mammogram dissuades or postpones a woman or her doctor from evaluating breast symptoms, she may suffer adverse consequences. Thus, a negative mammogram should never prevent a work-up of breast symptoms.
Compression of the breast is important during a mammogram to reduce motion artifact and improve image quality. Positioning of the woman is important. One study that evaluated how often pain and discomfort are felt during mammography reported that 90% of women undergoing mammography had discomfort, and 12% of women rated the sensation as intense or intolerable.  A systematic review of 22 studies investigating pain and discomfort associated with mammography found that the prevalence of pain varied widely. The degree of pain was associated with the stage of menstrual cycle, anxiety, and premammography anticipation of pain. 
The major predictors of radiation risk are young age at exposure and dose. For women older than 40 years, the benefits of annual mammograms probably outweigh the potential risk,  but certain subpopulations of women may have an inherited susceptibility to ionizing radiation damage.   In the United States, the mean glandular dose for screening mammography is 1 mSv to 2 mSv per view or 2 mSv to 4 mSv per standard two-view exam.   The radiation exposure from mammography is to the breasts, with a lower effective dose to the whole body of 0.29 mSv. Thus, it may be estimated that up to one breast cancer may be induced per 1,000 women aged 40 to 80 years undergoing annual mammograms. Based on statistical models, the risk would be doubled in women with large breasts who require increased radiation doses and in women with breast augmentation who require additional views. Radiation-induced breast cancers may be reduced fivefold for women who begin biennial screening at age 50 years rather than annually at age 40 years. 
Because large numbers of women have false-positive tests, the issue of psychological distress—which may be provoked by the additional testing—has been studied. A telephone survey of 308 women performed 3 months after screening mammography revealed that about one-fourth of the 68 women with a suspicious result were still experiencing worry that affected their mood or functioning, even though subsequent testing had ruled out a cancer diagnosis.  Research into whether or not the psychological impact of a false-positive test is long standing has found mixed results. A cohort study in Spain in 2002 found immediate psychological impact of receiving a false-positive mammogram, but these results dissipated within a few months.  A cohort study in Denmark in 2013 measuring the psychological effects of a false positive several years past the event found long-term negative psychological consequences associated with the experience of receiving a false-positive mammogram.  Several studies showed that the anxiety following evaluation of a false-positive test led to increased participation in future screening examinations.    
No randomized trials of clinical breast examination (CBE) as a sole screening modality have yet been reported. The Canadian National Breast Screening Study (NBSS) compared high-quality CBE plus mammography with CBE alone in women aged 50 to 59 years (refer to the Clinical Breast Examination section in the Overview section of this summary for more information). CBE, lasting 5 to 10 minutes per breast, was conducted by trained health professionals, with periodic evaluations of performance quality. The frequency of cancer diagnosis, stage, interval cancers, and breast cancer mortality were similar in the two groups and compared favorably with other trials of mammography alone, perhaps because of the careful training and supervision of the health professionals performing CBE.  Breast cancer mortality with follow-up 11 to 16 years after entry (mean = 13 years) was similar in the two screening arms (mortality rate ratio, 1.02 [95% confidence interval [CI], 0.78–1.33]).  The investigators estimated the operating characteristics for CBE alone; for 19,965 women aged 50 to 59 years, sensitivity was 83%, 71%, 57%, 83%, and 77% for years 1, 2, 3, 4, and 5 of the trial, respectively; specificity ranged between 88% and 96%. Positive predictive value (PPV), which is the proportion of cancers detected per abnormal examination, was estimated to be 3% to 4%. For 25,620 women aged 40 to 49 years who were examined only at entry, the estimated sensitivity was 71%, specificity was 84%, and PPV was 1.5%. 
Among community clinicians, screening CBE has higher specificity (97%–99%)  and lower sensitivity (22%–36%) compared with examiners in clinical trials of breast cancer screening.     A study of screening in women with a positive family history of breast cancer showed that, after a normal initial evaluation, the patient herself or her clinician performing a CBE identified more cancers than did mammography.  Another study examined the usefulness of adding CBE to screening mammography; among 61,688 women older than 40 years and screened by mammography and CBE, sensitivity for mammography was 78%, and combined mammography-CBE sensitivity was 82%. Specificity was lower for women undergoing both screening modalities than it was for women undergoing mammography alone (97% vs. 99%).  Two international trials of CBE are under way in India and Egypt.
Monthly breast self-examination (BSE) has been promoted, but there is no solid evidence that it is effective in reducing breast cancer mortality.   The only large, well-conducted, randomized clinical trial of BSE randomly assigned 266,064 women factory workers in Shanghai to receive either BSE instruction with reinforcement and encouragement, or instruction on the prevention of lower back pain. Neither group received any other breast cancer screening. After 10 to 11 years of follow-up, 135 breast cancer deaths occurred in the instruction group, and 131 cancer deaths occurred in the control group (relative risk [RR], 1.04; 95% CI, 0.82–1.33). Although the number of invasive breast cancers diagnosed in the two groups was about the same, women in the instruction group had more breast biopsies and more benign lesions diagnosed than did women in the control group. 
Other research on BSE is limited. First, Leningrad investigators randomly assigned by cluster more than 100,000 women to BSE training or control. The group that received BSE training had more breast biopsies but no improvements in breast cancer mortality.  Second, in the United Kingdom Trial of Early Detection of Breast Cancer, two districts invited more than 63,500 women aged 45 to 64 years to educational sessions about BSE. After 10 years of follow-up, there was no difference in breast cancer mortality rates compared with those in women from centers without organized BSE education (RR, 1.07; 95% CI, 0.93–1.22).  Third, and last, a case-control study nested within the Canadian NBSS compared self-reported BSE frequency before enrollment with breast cancer mortality. Women who examined their breasts visually, used their finger pads for palpation, and used their three middle fingers had a lower breast cancer mortality rate. 
Various methods to analyze breast tissue for malignancy have been proposed as screening methods for breast cancer.
Random periareolar fine-needle aspirates were performed in 480 women at high risk for breast cancer, and the women were monitored for a median of 45 months.  Twenty women developed breast neoplasms (13 invasive and 7 ductal carcinoma in situ [DCIS]). Using multiple logistic regression and Cox proportional hazards analysis, a diagnosis of hyperplasia with atypia was found to be associated with the subsequent development of DCIS and invasive breast cancer.
Nipple aspirate fluid cytology was studied in 2,701 women who were monitored for subsequent incidence of breast cancer, with an average of 12.7 years of follow-up.  Breast cancer incidence overall was 4.4%, including 11 cases of DCIS and 93 cases of invasive cancer, and was associated with abnormal nipple aspirate fluid cytology. Whereas the breast neoplasm rate was only 2.6% for 352 women in whom no fluid could be aspirated, it was 5.5% for 327 women with epithelial hyperplasia and 10.3% for 58 women with atypical hyperplasia.
One study reported results of nipple aspiration followed by ductal lavage in 507 women at high risk for breast cancer.  Nipple aspirate fluid was obtained from 417 women, but only 111 (27%) were adequate samples. A total of 383 ductal lavage samples were evaluated, 299 (78%) of which were adequate for diagnosis. Abnormal cells were found in 92 (24%) ductal lavage samples, including 88 (17%) with mild atypia, 23 (6%) with marked atypia, and 1 (<1%) malignant. The corresponding numbers and percentages for nipple aspiration fluid were 16 (6%), 8 (3%), and 1 (<1%). Discomfort with the ductal lavage procedure was judged by participants to be comparable with mammography. Because ductal lavage screening has not been compared with mammography, and there is no evidence of efficacy or mortality reduction, its use as a screening or diagnostic tool remains investigational.
Achieving balance between the benefits and harms of screening is especially important for women with a life expectancy of 5 years or less. Such women might have end-stage renal disease, severe dementia, terminal cancer, or severe comorbid disease with functional dependencies in activities of daily living. Early cancer detection and prompt treatment are unlikely to reduce morbidity or mortality within a woman's 5 years of expected survival, but the negative consequences of screening will occur immediately. Abnormal screening may trigger additional testing, with the attendant anxiety. In particular, the detection of a low-risk malignancy would probably result in a recommendation for treatment, which could impair rather than improve quality of life, without improving survival. Despite these considerations, many women with poor life expectancy as a result of age or health status often undergo screening mammography.  A sizable proportion of patients with advanced cancer continue to undergo cancer screening tests that do not have a meaningful likelihood of providing benefit. For example, among women with advanced cancer compared with controls, at least one screening mammogram was received by 8.9% (95% confidence interval [CI], 8.6%–9.1%) versus 22.0% (95% CI, 21.7%–22.5%). 
Screening mammography in women older than 65 years often results in additional diagnostic testing in 85 per 1,000 women, with cancer diagnosed in nine women. The testing is often accomplished over many months, which may cause anxiety.  While screening mammography may yield cancer diagnoses in approximately 1% of elderly women, many of these cancers are low risk. A study of California Medicare beneficiaries aged 65 to 79 years demonstrated this clearly. The relative risk (RR) of detecting localized breast cancer was 3.3 (95% CI, 3.1–3.5) among screened women. Diagnosis of metastatic cancer was reduced among screened women (RR, 0.57), suggesting a benefit of mammography screening in elderly women, although it comes with an increased risk of overdiagnosis. 
There is no evidence for performing screening mammography in average-risk women younger than 40 years.
Approximately 1% of all breast cancers occur in men. Most cases are diagnosed during the evaluation of palpable lesions, which are generally easy to detect. Treatment consists of surgery, radiation, and systemic adjuvant hormone therapy or chemotherapy. Because of the rarity of the disease, the usefulness of any screening modality is extremely unlikely.
Screening for breast cancer has been recommended for women exposed to therapeutic radiation to the chest, especially if they were exposed at an early age. One systematic review of observational studies of women exposed to large doses (≥20 Gy) of chest radiation before age 30 years found standardized incidence ratios of 13.3 to 55.5 for breast cancer, with no plateau with increasing age.  Screening mammography and magnetic resonance imaging can identify early-stage cancers in these women, but the benefits and risks have not been clearly defined.
Although age-adjusted breast cancer incidence rates are higher in white women than in black women, mortality rates are higher in black women. Among breast cancer cases diagnosed from 2004 to 2010, 62% of white women and only 52% of black women had localized disease. The 5-year relative survival rate for localized disease was 99.1% for white women and 94.0% for black women; for regional disease, it was 86.0% for white women and 74.6% for black women; and for distant disease, it was 26.2% for white women and 16.4% for black women.  Both breast cancer incidence and mortality are lower among Hispanic and Asian/Pacific Islander women than among white and black women.  Survival in black women may be worse than in white women at least in part because of a higher frequency of adverse histologic features, such as a triple-negative phenotype. 
Several explanations for these findings have been proposed, including lower socioeconomic status, lower level of education, and less access to screening and treatment services. Population-based studies demonstrate that, compared with other groups, Medicaid recipients and uninsured patients of all races have later-stage breast cancer diagnosis, and survival from the time of diagnosis is shorter. These differences are associated with socioeconomic status and may reflect lack of participation in screening activities.   Black women older than 65 years are less likely to undergo mammogram screening. Among regular users of mammography, however, cancer was diagnosed in black and white women at similar stages. 
Similar studies of Hispanic populations have been conducted. Breast cancer stage at diagnosis in San Diego County, California, was more advanced for Hispanic women than for white women, especially for those younger than 50 years. Low-income whites were more likely to have late-stage diagnosis than high-income whites. Among Hispanic women, there was no difference according to income, but all the Hispanic groups were at or below the lowest white income level.  In New Mexico, a population-based case-control study examined the reproductive histories of 719 Hispanic and 836 white breast cancer patients, with one-half of each group having breast cancer. The Hispanic women had higher body mass index, higher parity, and earlier pregnancies.  Whereas reproductive factors such as age at first full-term birth, parity, and duration of lactation accounted for some of the ethnic differences in breast cancer incidence for postmenopausal women, there was no evidence that these factors played a role in the differences for premenopausal patients. A study of mammography screening in an Albuquerque health maintenance organization found that Hispanic women had consistently lower rates of screening than did whites (50.6% vs. 65.5% in 1989, and 62.7% vs. 71.6% in 1996).  Predictors of more advanced stage at diagnosis included Hispanic race (odds ratio, 2.12) and younger age.
Informed medical decision making is increasingly recommended for individuals who are considering cancer screening. Many different types and formats of decision aids have been studied. (Refer to the PDQ summary on Cancer Screening Overview for more information.)
The study design and conduct make these results difficult to assess or combine with the results of other trials.
The reduction in breast cancer mortality at a median follow-up of 17.7 years corresponds to an absolute risk reduction of 0.1 of 1,000 (or 1 of 10,000) fewer deaths.
The evidence is inadequate to support the conclusion of a clinically significant breast cancer mortality reduction attributable to initiation of screening mammography among women aged 39 to 49 years. The reported mortality reduction is a very small, transient reduction in breast cancer mortality based on a nonstandard imaging schedule, nonstandard imaging protocol, and nonstandard threshold for biopsy; therefore, it is of uncertain relevance to the general population. In absolute terms, it corresponds to an absolute risk reduction of 0.1 of 1,000 (or 1 of 10,000) fewer deaths. Additionally, the mortality reduction is based on a re-analysis of the original data set, which was not statistically significant, and the recalculation of breast cancer mortality in a subgroup restricted to 10 years of follow-up. At 20 years of follow-up, there was no statistically significant decrease in risk of breast cancer or all-cause mortality. 
The evidence is inadequate to make a clear determination of the magnitude of overdiagnosis. Because the evidence is based on subgroup analysis and nonstandard imaging schedule, nonstandard imaging protocol, and a nonstandard threshold for biopsy with uncertain relevance to the general population, it does not support the investigators' conclusion of “at worst a small amount of overdiagnosis." 
The PDQ cancer information summaries are reviewed regularly and updated as new information becomes available. This section describes the latest changes made to this summary as of the date above.
Added text to state that overdiagnosis occurs when screening procedures detect cancers that would never become clinically significant. Also added that because nearly all cases of cancer and ductal carcinoma in situ (DCIS) will be treated, women with clinically insignificant cancers will suffer treatment-related side effects unnecessarily.
Added text to state that one approach to understanding overdiagnosis is to examine the prevalence of occult cancer in women who died of noncancer causes; in an overview of seven autopsy studies, the median prevalence of occult invasive breast cancer was 1.3% and of DCIS was 8.9% (cited Welch et al. as reference 16 and Black et al. as reference 17).
Added text to state that overdiagnosis can be indirectly measured by comparing breast cancer incidence in screened populations with breast cancer incidence in unscreened populations, and these comparisons can be further complicated by differences in the populations, such as time, geography, health behaviors, and hormone usage. Included text to state that calculations of overdiagnosis can vary in the adjustment for lead-time bias (cited Duffy et al. as reference 18 and Gøtzsche et al. as reference 19). Also added text to state that an overview of 29 studies found calculated rates of overdiagnosis of 0% to 54%, with rates from randomized studies between 11% and 22%. Additionally, in Denmark, where screened and unscreened populations existed concurrently, the rate of overdiagnosis of invasive cancer was calculated to be 14% and 39%, using two different methodologies; however, if DCIS cases were included, the overdiagnosis rates were 24% to 48% (cited Nelson et al. as reference 20 and Jørgensen et al as reference 21).
Added text to state that theoretically, in a given population, the detection of more breast cancers at an early stage should result in a subsequent reduction in the incidence of advanced-stage cancers; thus, the detection of more early-stage cancers through screening probably represents overdiagnosis.
This summary is written and maintained by the PDQ Screening and Prevention Editorial Board, which is editorially independent of NCI. The summary reflects an independent review of the literature and does not represent a policy statement of NCI or NIH. More information about summary policies and the role of the PDQ Editorial Boards in maintaining the PDQ summaries can be found on the About This PDQ Summary and PDQ® - NCI's Comprehensive Cancer Database pages.
This PDQ cancer information summary for health professionals provides comprehensive, peer-reviewed, evidence-based information about breast cancer screening. It is intended as a resource to inform and assist clinicians who care for cancer patients. It does not provide formal guidelines or recommendations for making health care decisions.
This summary is reviewed regularly and updated as necessary by the PDQ Screening and Prevention Editorial Board, which is editorially independent of the National Cancer Institute (NCI). The summary reflects an independent review of the literature and does not represent a policy statement of NCI or the National Institutes of Health (NIH).
Board members review recently published articles each month to determine whether an article should:
Changes to the summaries are made through a consensus process in which Board members evaluate the strength of the evidence in the published articles and determine how the article should be included in the summary.
Any comments or questions about the summary content should be submitted to Cancer.gov through the NCI website's Email Us. Do not contact the individual Board Members with questions or comments about the summaries. Board members will not respond to individual inquiries.
Some of the reference citations in this summary are accompanied by a level-of-evidence designation. These designations are intended to help readers assess the strength of the evidence supporting the use of specific interventions or approaches. The PDQ Screening and Prevention Editorial Board uses a formal evidence ranking system in developing its level-of-evidence designations.
PDQ is a registered trademark. Although the content of PDQ documents can be used freely as text, it cannot be identified as an NCI PDQ cancer information summary unless it is presented in its entirety and is regularly updated. However, an author would be permitted to write a sentence such as “NCI’s PDQ cancer information summary about breast cancer prevention states the risks succinctly: [include excerpt from the summary].”
The preferred citation for this PDQ summary is:
PDQ® Screening and Prevention Editorial Board. PDQ Breast Cancer Screening. Bethesda, MD: National Cancer Institute. Updated <MM/DD/YYYY>. Available at: https://www.cancer.gov/types/breast/hp/breast-screening-pdq. Accessed <MM/DD/YYYY>. [PMID: 26389344]
Images in this summary are used with permission of the author(s), artist, and/or publisher for use within the PDQ summaries only. Permission to use images outside the context of PDQ information must be obtained from the owner(s) and cannot be granted by the National Cancer Institute. Information about using the illustrations in this summary, along with many other cancer-related images, is available in Visuals Online, a collection of over 2,000 scientific images.
The information in these summaries should not be used as a basis for insurance reimbursement determinations. More information on insurance coverage is available on Cancer.gov on the Managing Cancer Care page.
More information about contacting us or receiving help with the Cancer.gov website can be found on our Contact Us for Help page. Questions can also be submitted to Cancer.gov through the website’s Email Us.