The diagnostic accuracy of urine-based tests for bladder cancer varies greatly by patient

Background Spectrum effects refer to the phenomenon that test performance varies across subgroups of a population. When spectrum effects occur during diagnostic testing for cancer, difficult patient misdiagnoses can occur. Our objective was to evaluate the effect of test indication, age, gender, race, and smoking status on the performance characteristics of two commonly used diagnostic tests for bladder cancer, urine cytology and fluorescence in situ hybridization (FISH). Methods We assessed all subjects who underwent cystoscopy, cytology, and FISH at our institution from 2003 to 2012. The standard diagnostic test performance metrics were calculated using marginal models to account for clustered/repeated measures within subjects. We calculated test performance for the overall cohort by test indication as well as by key patient variables: age, gender, race, and smoking status. Results A total of 4023 cystoscopy-cytology pairs and 1696 FISH-cystoscopy pairs were included in the analysis. In both FISH and cytology, increasing age, male gender, and history of smoking were associated with increased sensitivity and decreased specificity. FISH performance was most impacted by age, with an increase in sensitivity from 17 % at age 40 to 49 % at age 80. The same was true of cytology, with an increase in sensitivity from 50 % at age 40 to 67 % at age 80. Sensitivity of FISH was higher for a previous diagnosis of bladder cancer (46 %) than for hematuria (26 %). Test indication had no impact on the performance of cytology and race had no significant impact on the performance of either test. Conclusions The diagnostic performance of urine cytology and FISH vary significantly according to the patient demographic in which they were tested. Hence, the reporting of spectrum effects in diagnostic tests should become part of standard practice. Patient-related factors must contextualize the clinicians’ interpretation of test results and their decision-making.


Background
Bladder cancer (BC) represents 4.5 % of all new cancers in the US with over 74,000 cases and it remains the 5th most common in 2015 [1]. Typically, it presents with hematuria, and 70 % of patients with BC initially have non-muscle invasive bladder cancer (NMIBC). NMIBC has a high chance of recurrence (60-85 %) and requires long term surveillance [2]. Several guidelines exist for the management of non-muscle invasive bladder cancer, and include cystoscopy and urine-based tests for initial screening and recurrence surveillance [3][4][5].
Cystoscopy is the community gold standard for the detection of bladder tumors, and identifies nearly all papillary and sessile tumors [6]. However, it is invasive and a source of distress for patients. It also has a limited ability to detect occult microscopic disease or the presence of tumors in atypical locations. Microscopic disease is of particular importance in BC because of prevalent field effect [7]. While urethral cancer is a rare event, [8] upper tract tumors (UTUC) account for 5-10 % of urothelial cell carcinoma and may lead to increased morbidity and mortality if missed [9]. Therefore, guidelines recommend adjunctive tests for detection of BC [3][4][5]. The two most common urine-based tests are voided urine cytology and UroVysion™ (Vysis, Downers Grove, IL) fluorescence in situ hybridization (FISH) assay. Most physicians and their patients will assume that a positive urine test indicates the presence of a tumor, and will aggressively pursue a diagnosis.
The majority of physicians believe that a urine test will perform similarly in all patient populations, but this may be a false assumption. Test performance often varies across patient subgroups and is termed spectrum effects [10][11][12]. Although reporting spectrum effects for a given test is endorsed by the STARD initiative, it is uncommon in practice [13]. We are the first to evaluate for the existence of spectrum effects in cytology and FISH among patients being screened because of hematuria or undergoing surveillance of NMIBC. Our hypothesis is that test performance varies according to patient characteristics. We analyzed the diagnostic performance by test indication as well as four clinically significant demographic variables -age, gender, race, and smoking status. The objective of this study was to determine the presence and magnitude of spectrum effects occurring in cytology and FISH of a large contemporary cohort undergoing bladder cancer screening.

Subject selection
After approval by the Duke University Health System Institutional Review Board, all subjects who underwent cystoscopy and cytology and/or UroVysion FISH at Duke University Medical Center (DUMC) between 1/ 2003 to 1/2012 for either hematuria evaluation or surveillance of bladder cancer were identified. As the data for the study was obtained through retrospective chart review, a waiver of informed consent was approved by the IRB. For patients with signs or symptoms of urinary tract infection, the standard practice at our institution was to collect a urine specimen for culture, treat the patient with culture-specific antibiotics, and delay cystoscopy and urine marker testing for 2-4 weeks to avoid confounding the results.

Cystoscopy as the diagnostic gold standard for bladder tumor
White light cystoscopy, the community gold standard in diagnosis of bladder tumors, was used to determine the presence or absence of a bladder tumor [4,[14][15][16]. Cystoscopy was chosen over biopsy as the standard against which urine tests were compared because a biopsy is obtained only in subjects with an abnormal cystoscopy or urine test, which would subject the results to considerable verification bias [17]. Cystoscopy results were classified as positive, suspicious, or negative. A positive cystoscopy serves as a surrogate for histopathology, as nearly all visible tumors are malignant [6]. We required that cystoscopy occur within +/− 30 days of the urine-based test to serve as the gold standard.

Cytology
Urine samples received in the Cytology Preparatory Laboratory were prepared as ThinPrep slides (Cytyc Corporation, Marlborough, MA). After samples were centrifuged at 2800 rpm for 5 min, the supernatant was removed to produce a cell pellet. Cell pellets were washed with Cytolyt Solution. Two to three drops of each patient sample was transferred into PreservCyt Solution and fixed for 15 min. ThinPrep slides were then produced by loading the samples into the ThinPrep 2000 Processor. The ThinPrep slides were stained with Papanicolaou stain, cover-slipped and then screened by a cytotechnologist before being evaluated by a cytopathologist. More than one cytopathologist was involved in the analysis of the urine specimens during the study interval. After cytological evaluation, the specimens were classified into one of four categories: negative, atypical, suspicious for malignancy, or positive for malignancy.

UroVysion FISH test
Patient samples for UroVysion FISH were prepared according to manufacturer recommendations (Abbott Molecular Inc., Abbot Park, IL). The UroVysion Probe mixture contains chromosome enumeration probes (CEPs) labeled with Spectrum Red for visualization of chromosome 3, Spectrum Green for visualization of chromosome 7 and Spectrum Aqua for visualization of chromosome 17, as well as a locus specific probe for 9p21 labeled with Spectrum Gold. The slides were counterstained with DAPI and visualized with a fluorescence microscope equipped with the appropriate filters for signal enumeration of each fluorophore. A minimum of 25 morphologically abnormal cells per test were analyzed. The UroVysion FISH result was defined as meeting one or more of the following criteria: (i) ≥ 4 cells with gains of 2 or more chromosomes 3, 7, and 17 in the same cell, (ii) ≥ 10 cells with tetrasomy of chromosomes 3, 7, and 17, (iii) ≥ 10 cells showing gains of a single chromosome 3, 7, or 17, and (iv) ≥ 12 cells with homozygous loss of 9p21 locus [18].

Statistical methods
Diagnostic test performance metrics and 95 % confidence intervals (95 % CI) were calculated using logistic models: (a) a generalized estimating equation (GEE) using an exchangeable (compound symmetry) covariance structure, [19] and (b) a generalized linear mixed model (GLMM) [19]. While both models take into account clustered/correlated test results that occur due to repeated testing within subjects, they are different techniques and results are interpreted differently [20]. The GEE is a marginal model that is interpreted as "population-averaged," whereas the GLMM is a conditional model interpreted in a "subject-specific" manner [21]. Sensitivity and specificity were calculated for the overall cohort as well as by indication, age, gender, race, and smoking status subgroups. Age was analyzed as a continuous variable, but the results are presented in age decades for ease of interpretation. Indication, gender, race, and smoking status were analyzed as categorical variables. Smokers were stratified as "Never smokers," "Former smokers," or "Current smokers," as indicated in their electronic medical charts. Smoking status was available on all patients in both the cytology and FISH cohorts. A two-sided p-value of 0.05 was used to define statistical significance. Statistical analyses were conducted using R 3.1.3 with packages lme4, geepack, and BSagri installed.
The diagnostic performance of urine cytology is shown in Table 2 and Fig. 1. Increasing age was associated with an increase in sensitivity and decrease in specificity of urine cytology. Sensitivity increased by 17 %, from 50 % in subjects ≤40 years to 67 % in those ≥80 years. In contrast, specificity declined from 53 % in subjects ≤40 years of age to 36 % in subjects ≥80 years of age. Gender had the greatest impact on cytology performance. Subjectspecific estimates of sensitivity derived from the GLMM model were dramatically higher in men than women (67 % vs 51 %), though specificity was lower (36 % vs 53 %). In subjects with a history of smoking, cytology was 10 % more sensitive and proportionally less specific compared with subjects who had never smoked. Race and indication did not significantly impact cytology test performance in either of the models.
The diagnostic performance of UroVysion FISH is shown in Table 3 and Fig. 2. Again, increasing subject age was associated with increased sensitivity and decreased specificity. Subject-specific estimates of test sensitivity obtained from the GLMM model nearly tripled from 17 % in subjects ≤40 years of age to 49 % in those ≥80 years of age. Contrarily, specificity decreased from 93 % in subjects ≤40 years of age to 74 % in those ≥80 years of age. The UroVysion FISH test was substantially less sensitive For all the above analyses, suspicious cystoscopies were considered positive since they will generally result in intervention (e.g., bladder biopsy). To determine whether the classification of suspicious cystoscopies dramatically affected our results, we repeated the analyses with suspicious cystoscopies classified as negative and found no significant difference in our results, demonstrating that the performance of cytology and UroVysion FISH are not sensitive to how suspicious cystoscopies are classified. This stands in contrast to the large effect seen in cytology with a similar re-analysis that was mentioned above.

Discussion
Spectrum effects were first described by Mulherin et al. as inherent variations in diagnostic test performance among different subgroup populations [12]. We have shown that urine-based tests for bladder cancer (a) have poor diagnostic performance and (b) vary substantially in accuracy in different patient populations. However, the recognition of spectrum effects allows for a strategy that should result in a clinically important gain for the patient.
We stratified our cohort into four clinically relevant subgroups and found that age, male gender, and a history of smoking were all associated with increased sensitivity in both cytology and UroVysion. Smoking and aging are associated with altered cellular biology which might lead to changes detectable by cytology or UroVysion [22]. Epidemiologically, age and cigarette smoking have also been associated with more advanced disease at initial presentation [22,23]. It is possible that the improvement in sensitivity of cytology and UroVysion is due to more advanced disease at presentation in these demographics. Horstmann et al. found that age was associated with higher false positive rates in cytology and the NMP22 assay, which would translate to decreased specificity and is consistent with our results [24]. The analysis by indication also revealed increased sensitivity for UroVysion but not cytology when used for cancer surveillance compared to hematuria. This may also be a reflection of advanced disease in that population. Interestingly, Dimashkieh et al. found that both UroVysion and cytology are slightly more sensitive in the context of cancer surveillance than in hematuria [25].
Disease severity fails to explain why both tests were more sensitive in males than females. While the incidence of bladder cancer is three to four times higher in men, women tend to present with more advanced disease [26,27]. An alternative explanation for the gender disparity we observed is that gender-specific genetic differences are affecting test performance. Recent studies have found gender differences at a cellular level, and postulate that cells have a "sex" [28]. Shen et al. have elucidated gender differences in bladder cancer biology thought to be related to differential expression of sex steroid receptors on urothelial cells [29]. Specifically, the beta subunit of the estrogen receptor is the predominant receptor expressed in the majority of bladder cancers, and a positive correlation exists between degree of estrogen receptor expression and tumor grade and stage [29]. These gender differences in cancer biology may result in differences in cytologic morphology. Distinct patterns of chromosomal abnormalities between the genders have been described in other cancers and it is possible that the specific chromosomal aberrations detected by the UroVysion test result in improved sensitivity in men [30]. Proper stratification into relevant subgroups allows for recognition of important spectrum associations [31]. There is value in discerning between low grade and high grade lesions; high grade should be detected as early as possible, while the likelihood of missing such a tumor should be as low as possible. In high risk populations, sensitivity is more important than specificity because the consequences of a missed malignancy are great. FISH exhibits such properties in the smoking subgroup, whereas cytology does not have similar characteristics in the same population. Therefore, a clinician should give stronger consideration to FISH results than cytology results in smokers. Analogous spectrum effects can be seen for indication and cytology.
There are other patient populations were the risks of a procedure often outweigh the benefit. It is preferable for a urine test with a high specificity and low sensitivity in low grade disease to reduce the number of unnecessary invasive procedures. Age and cytology illustrate this effect because as the patient age increases, so does the specificity, with a reciprocal decrease in sensitivity. This should spare the elderly patient avoidable cystoscopies. The tradeoff would be that some tumors may be missed for a period of time, but the literature surrounding active surveillance suggest this is safe [32].

Limitations
Our study was retrospective, and longitudinal in nature leaving us unable to control for significant variables, such as the EORTC risk scores, that predict the probability of recurrence and progression of bladder cancer. With 19 % of cystoscopies in the cytology cohort classified as positive, this cohort was at higher risk for bladder cancer than the average US population. Additionally, while the sensitivity could have been improved with narrow band imaging or fluorescent cystoscopy, these technologies were not available at our institution for the entirety of the study period. For the purposes of our analyses, suspicious lesions on cystoscopy were classified as positive. When we correlated this classification with pathology, only 59 % of pathology specimens were found to have cancer, reflecting a limitation of this classification. However, when we performed a sensitivity analysis with suspicious cystoscopies classified as negative, our results were not significantly different, indicating a minimal impact of this limitation on the interpretation of the results. The data were collected over a 10 year time frame; so indications for using the tests have changed over time as have technique of verification of test results. Furthermore, more than one cytopathologist was involved over the period examined and literature suggests high inter-observer discrepancy, but this reflects the real world. Urine cytology has a low sensitivity and is highly operator-dependent in the setting of low grade disease [33]. In experienced hands, however, specificity is about 90 % [34]. Indeed, our own data supports this conclusion, and shows an increasing percentage of reported atypical/suspicious cytologies over time (Fig. 3).

Conclusions
We are the first to show that urine-based bladder cancer tests display spectrum effects. The reporting of spectrum effects in diagnostic tests should become part of standard practice. Knowledge of these effects allows the physician to properly interpret the results and has a meaningful impact on a patient's clinical care.