A systematic review of the diagnostic accuracy of prostate specific antigen

Background Prostate cancer is the fourth commonest cancer in the UK, and the incidence is rising. The reference standard for diagnosing this condition is prostate biopsy, an invasive procedure. This study systematically reviews recent literature on tPSA accuracy. The time period was restricted to ensure that the estimates referred to contemporary tPSA tests and prostate cancer reference standards. The focus of this review was restricted to European populations as tPSA levels are known to vary by population group. Methods Medline was searched (from 01/1998 to 01/2008) and Embase (from 01/1998 to 01/2008), which returned 3087 citations. These were assessed by 6 reviewers, who shortlisted 54 of possible relevance. 2 reviewers assessed each using the following inclusion criteria: data collection between 1998-2008; tPSA measurements for all participants; histological confirmation of the diagnosis; samples from a European population and sufficient data to calculate 2 × 2 tables. The final set of 10 included studies represented 5373 participants. Quality of the included studies was assessed in duplicate using criteria suggested by the Cochrane Collaboration. Review Manager 5.0 software was used to analyse the data, including plotting a series of summary receiver operator curve spaces (SROC). Results tPSA sensitivities ranged from 0.78 to 1.00 and specificities from 0.06 to 0.66. Positive likelihood ratios ranged from 0.83 to 2.90 and negative likelihood ratios ranged from 0.00 to 3.75 Conclusion tPSA has a role to play as one of several indicators for prostate biopsy along with abnormal digital rectal examination and urinary symptoms. However, tPSA test has a high false positive and significant false negative rate. It is important that clinicians understand these limitations.


Background
There were 28,886 newly diagnosed cases of prostate cancer in 2005 in England, comprising 24.1% of all cancers in men in that year. It is the fourth commonest cancer in the UK [1]. In 2005 9024 men died of prostate cancer, mostly between the ages of 80-84 [2]. The incidence of prostate cancer in European men rose from 202,100 in 2004 [3], to 301,500 in 2006 [4]. It remained the commonest cause of cancer in European men and the third commonest cause of death. The incidence is rising, partly due to our ageing population [5]. Total prostate specific antigen (tPSA) testing has risen significantly from 1999 to 2002 [6]. It is therefore important that the validity of the tPSA test be fully understood to ensure appropriate testing and referral for further investigations.
However tPSA testing was not intended as a diagnostic test, but for identifying individuals requiring further investigation [7]. NICE currently recommend the Prostate Cancer Risk Management Programme's age specific ranges for tPSA cut off levels (Table 1). NICE describe the test as moderately sensitive and specific [8]. However no evidence is provided for this, despite claiming their judgement is based on secondary research and selected primary research. Patients with a tPSA higher than the given level are recommended to undergo further investigation. However, there is great variation in clinical practice within the UK [5], with some trusts using a single cut off value of 4 ng/mL and some using the age specific ranges as shown in table 1.
tPSA is an inherent part of the prostate cancer diagnosis pathway. This comprises of symptoms, digital rectal examination (DRE), tPSA level and transurethral ultrasound guided biopsy (TRUS) (Figure 1). Although it is not certain what the diagnostic pathway would be like in the absence of tPSA, it seems likely that virtually all patients with suspicious clinical findings would require biopsy.
In the past few years there has been substantial debate regarding the role of tPSA as a diagnostic tool. There is a large quantity of literature available on tPSA and a simple search of the term PSA OR Prostate Specific Antigen in PubMed gives 20,469 hits. Furthermore the introduction of screening in the USA has brought discussion of whether a similar screening programme should be introduced in Europe. At the centre of this debate is the uncertainty of the diagnostic accuracy of the tPSA test. The focus of this review will be the diagnosis of prostate cancer in patients presenting with symptoms. However other forms of PSA testing such as PSA velocity, PSA density and free to total PSA ratios are not assessed. This is a reflection of clinical practice as they are of limited value and not included in the European guidance [9].
In 1999 the World Health Organisation (WHO) established a reference standard for total PSA measurement [10]. Since then, differences have decreased between the results of different assay methods [11]. Also, the Standard for the Reporting of Diagnostic Accuracy Studies (STARD) was formulated in September 2000, to improve the accuracy and completeness for reporting diagnostic accuracy studies. Its aim is to encourage a more standardised and transparent format for diagnostic test studies [12]. This review upholds these standards.

Objectives
In this systematic review we aim to assess the sensitivity and specificity of the PSA test in the diagnosis of prostate cancer.

Participants
Men participating in studies of prostate cancer diagnosis, carried out in Europe. No age restrictions were used. Flow diagram depicting the diagnostic pathway for prostate cancer Figure 1 Flow diagram depicting the diagnostic pathway for prostate cancer.

Index tests
Total prostate specific antigen testing. Only papers in which a full range of tPSA from <4 ng/mL to >10 ng/mL as this reflects the standard European practice.

Target conditions
Prostate cancer, irrespective of Gleason or TNM score.

Reference standards
Histological confirmation of prostate cancer either from a biopsy or resected specimen.

Search methods for identification of studies
Electronic searches of MEDLINE and EMBASE were performed.

Electronic searches
The

Data collection and analysis
Selection of studies A list of articles meeting the inclusion criteria based on abstracts was complied. These studies and those of uncertain relevance were retrieved in full text and split into three groups based on date of publication. Two reviewers independently evaluated each group of studies for inclusion, with any discrepancies being discussed with a third reviewer until a final set of relevant studies was agreed.

Data extraction and management
The following data was extracted from each study: The data was entered into Review Manager 5.0 software.

Assessment of methodological quality
The methodological quality for each paper was assessed by 2 reviewers independently using the QUADAS [13] criteria [see Additional file 1].

Statistical analysis and data synthesis
Sensitivity, specificity, true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) were taken directly from the source papers. If this was not possible, values were calculated from the data that was provided. Positive and negative likelihood ratios, diagnostic odds ratios, and 95% confidence intervals were calculated. The data was displayed graphically on forest and SROC plots. The SROC curve was fitted using the Littenberg-Moses method.

Investigations of heterogeneity
Heterogeneity between studies was assessed and subgroup analyses were performed using study design as the main variable.

Results
The Espana 1998 performed two tPSA tests on each participant using the second for the main statistical analysis.
Only the first test data fit the criteria for this review, therefore this data was used.
Aragona 2005 involved 16,298 participants. However only 3,171 underwent biopsy and hence only these patients were included in our analysis. Figure 2 shows the overall quality of the 10 included studies. In particular, the explanation of withdrawals and index blinding were poorly reported. It is also unclear whether uninterpretable results were reported. These are potentially important sources of bias.

Methodological quality of included studies
A breakdown of the methodological quality can be found in Figure 3. It can be seen that the Espana 1998 study had low quality index blinding and withdrawal explanation. The Aragona 2005 study had low quality reporting of uninterpretable results; eight of the patients in this study were lost without explanation. This is significant as this study contributed the largest proportion of our total data. The Unal 2000 study had low quality reporting of index blinding. This is perhaps less significant as this study is the smallest and blinding of the index test has a minimal effect on the PSA test and its interpretation. However the small size of this study has meant that findings, such as a false negative rate of 0, need to be interpreted cautiously. Table 2  Positive and negative likelihood ratios and diagnostic odds ratios (DORs) are displayed in Table 3. DORs are displayed in a forest plot in Figure 5. Only 9 of the 10 studies were plotted as the confidence intervals (CIs) of the odds ratio for the Unal 2000 study could not be calculated.

Statistical analysis
All studies have a DOR above 1 except Espana 1998 and Wymenga 2000. Fischer 2005 has an odds ratio of 1.729 but the CI crosses 1. The remaining six studies all have their odds ratios above 1 and CIs that do not include 1, implying that the positive association of tPSA with prostate cancer is not accounted for by chance alone in these Methodological quality graph: review authors' judgments about each methodological quality item presented as per-centages across all included studies studies. Aragona 2005 has a very narrow CI compared to the other studies, which may be attributed to its large size.
Positive likelihood ratios (PLR) were above 1 for all studies except Espana 1998 and Wymenga 2000, indicating that a raised tPSA is associated with prostate cancer. However all PLRs are below 10, the threshold generally accepted for a useful test. The same eight studies have negative likelihood ratios (NLR) less than 1 indicating that a low tPSA is correctly associated with not having the disease. However only one meets the accepted level of less than 0.1.
The SROC curve [ Figure 6] lies to the left of the diagonal signifying that the tPSA test has value. The SROC analysis was further developed by placing the studies into subgroups based up trial design. [ Figure 7].

Subgroup analysis
To explore the heterogeneity, subgroups of cohort and case control, and pre-1999 and post-1999 were created. Summary receiver operating characteristic (SROC) curves were plotted for each subgroup. Figure 7 shows greater test accuracy in the case control subgroup, figure 8

Discussion
Overall tPSA seems to have significance as a diagnostic tool. DORs ranged from 0.221 to 4.16. All but two studies gave DORs greater than 1. The SROC curve [ Figure 6] lies to the left of the diagonal signifying that the PSA test has value. Adequate levels of sensitivity appear to be achieved at the expense of poor specificity, with consequently relatively high numbers of false positive results.
Espana 1998 and Wymenga 2000 had DORs below 1. This suggests that the PSA of more than 4 ng/ml is inversely associated with prostate cancer. Potential sources of bias for Espana 1998 are its age and poor explanation of withdrawn participants. Its small size makes it Forest plot of sensitivity and specificity of tPSA testing Figure 4 Forest plot of sensitivity and specificity of tPSA testing.
especially vulnerable to these factors. Wymenga 2000 was a cohort study which did not exclude borderline cases of raised PSA. This gives the appearance of poorer test accuracy, compared to a case-control study.
Unal 2000's isolated position in the top left on the SROC curve produces the most significant results supporting the use of tPSA as a diagnostic test. Its false negative rate of 0 is likely to be due to the small size of the study and its case control design. Despite its outlying results its high methodological quality warrants its inclusion.
In contrast, Aragona 2005 is the largest included study and has the narrowest CI. Its sensitivity and specificity lies within the main cluster of results close to the SROC curve, adding weight to our findings.

Strengths and weaknesses of the review
A large number of abstracts were reviewed. With regards to study design, suitable publications may have been omitted due to the sole use of electronic searches, reviewer error or limited search terms. Further detail could be added to the searches, including the use of limited text terms. Publication bias may occur but there is no consensus on its importance [14] or how to assess the impact on this on systematic reviews of diagnostic test accuracy [15].
As the current use of tPSA in clinical practice is debated, it is unclear whether publication bias would exclude papers showing a low test accuracy or high test accuracy.
The populations of our studies were limited to men attending urology clinics because of referral due to clinical evidence in primary or secondary care, such as an abnormal DRE or raised PSA. This means that our results cannot be applied to the screening population. In this setting there would be a lower prevalence of prostate cancer so it Forest plot of Diagnostic Odds Ratios is likely that PSA would have less accuracy as a diagnostic test since the specificity has been shown to be low. Also, the populations studied were European men. A more detailed breakdown of the race of the study populations would have provided us with more information on sources of heterogeneity. Overall the results can be applied to symptomatic European men in the primary and secondary healthcare setting.
The subgroup analyses can show valuable results, however there are some limitations. Firstly, there is overlap between case control and cohort studies. It was sometimes difficult to place the studies into these categories. Secondly, for our pre-and post-1999 analysis there was only 1 study in the pre-1999 group.
Pre-1999 many assays were widely used for the detection of total PSA, for example Tandem-E, Tandem-R, Immulite 2000, ADIVA Centaur and Roche 2. There has been significant difference in the results using the various assays [16][17][18]. Also as mentioned earlier, in 1999 the World Health Organisation established a reference standard for total PSA measurement [5]. Since then, differences have decreased between the results of different assay methods [6].
Ultimately, the lack of large studies on Europeans which were suitable for our analysis was the main limitation of this review.

Applicability of findings to clinical practice and policy
PSA testing is clearly a vital part of the diagnostic pathway. We have previously discussed the limitations of the study populations. However, we can apply our results to patients who are referred for a biopsy. This is useful for general practitioners and urologists to reassure patients with a raised PSA.
We have found that the PSA test had a sensitivity ranging from 0.78 to 1.00, which means it potentially fails to diagnose over 20% of prostate cancers. This is important to consider in patients with continuing symptoms or an isolated, abnormal DRE. Good quality counselling and information needs to be given to patients to ensure they present again if symptoms persist or worsen. The DRE needs to remain a key part in the diagnostic pathway.
PSA is known to have low specificity, however our results show an extremely low range of 0.06 to 0.66. All but Unal 2000 showed a specificity of less than 0.40. This is in contrast to a moderate specificity as stated by NICE in the most recent guidelines on referral practice for suspected cancer in adults and children. Such a low specificity means that in practice many patients are undergoing the invasive procedure of biopsy who do not in fact have prostate cancer. However there is currently no alternative that has been recommended by NICE for use in clinical practice. It might be interesting to sub-analyse the data accord-SROC curves comparing the study using the pre 1999 PSA assay and the studies using assays from 1999 and onwards Figure 8 SROC curves comparing the study using the pre 1999 PSA assay and the studies using assays from 1999 and onwards.
Summary ROC Plot of PSA testing taking account of trial design Figure 7 Summary ROC Plot of PSA testing taking account of trial design.
ing to the patients' presenting symptoms, as this would be useful in the assessment of PSA as both a diagnostic test and a screening tool.

Conclusion
Implications for practice PSA testing still has a role to play in the diagnostic pathway and is relatively non-invasive and inexpensive [19]. However it has a low specificity. Other tests, which could improve this, are currently being researched but have not been implemented into clinical practice. We recommend that PSA testing continues to be used in clinical practice as one of the several indicators for biopsy, but it is important that clinicians understand the limitations of the test. It would be interesting to assess the role of a PSA result in GPs' decision making, for example the impact of the high false positive and significant false negative rates.

Implications for research
As our study was unable to assess specificity and sensitivity in the screening setting it would be useful for a review to examine this. Screening is especially relevant as programmes emerge. We would recommend that for further research done into PSA, the STARD statements be implemented to ensure standardisation and transparency. We recommend that further research includes a sub-analysis according to patients' symptoms.