# Our proposal is to calculate P

Our proposal is to calculate P-values exactly without approximation, using this simulated data. This is in fact an old concept for rank statistics such as the Wilcoxon rank sum statistic, where published tables have long been available for use with data from studies involving very small sample sizes [7]. Modern computing power now makes the approach feasible for studies with larger sample sizes and for any statistic. The idea is to enumerate all the possible values of the statistic for the setting where cases have biomarker values with the same distribution as controls and to evaluate how extreme the observed biomarker data statistic is to calculate its exact P-value.
To demonstrate that the method used to calculate P-values in real data analysis can have a substantial effect on conclusions drawn, we also reanalyzed data from an ER/PR positive breast cancer biomarker discovery study reported in [8]. A detailed description of our simulation studies, analytic approach [[4], [5], [6]], and the ER/PR positive breast cancer discovery study is included in the Methods section of Supplementary Data .

Results

Discussion
Exact P-values, calculated according to the definition of P-value, provide the true probability of observing a statistic as extreme as that observed in the study, when case biomarker values are derived from the same distribution as controls. For convenience, approximation P-values are typically used in practice. Our results using one classic simulation scenario show that approximations can be substantially off, leading to less reliable conclusions. Additional simulations (Table S.3, Supplementary Data) show qualitatively similar conclusions when true biomarkers were more diverse than in our classic simulation scenario.
Exact P-values can be calculated for any two sample test statistic. Our analyses used nonparametric rank statistics, in particular the empirical sensitivity at fixed 90% specificity, and the area under the ROC curve. The issues concerning exact versus approximate P-values also apply to parametric non-rank based statistics, including the t-test for example. However, the null Resiniferatoxin reference distribution that is needed to calculate exact P-values requires estimating the control biomarker distribution and repeatedly generating random case and control simulated study data from it. This can be a complicated exercise for parametric non-rank based statistics, particularly in the context of discovery research where automated procedures are needed to deal with diverse data on large numbers of biomarkers. We cannot assess parametric assumptions for each biomarker. Moreover, outliers and non-standard distributions are common in discovery research. Therefore, we prefer rank based non-parametric statistics, and those were the focus of our study. We note clone for rank-based nonparametric statistics, p-values from permutation tests are the same as exact P-values. Therefore another interpretation of our results is that when one is using rank-based statistics, permutation test P-values are preferred over normal approximations. We note, however, that permutation test P-values do not correspond with exact P-values for parametric non-rank based statistics.
We found two additional advantages from use of exact P-values. First, biomarker performance measures align well with exact P-values, in that markers with the best estimated performances have the smallest P-values. This inverse relationship holds by definition when the same numbers of case and control data points are available for each biomarker. In addition we found the inverse relationship was mostly true in the analysis of the breast cancer data set where data were sporadically missing. However, the analysis that used approximate logit-normal P-values led to some major inconsistencies between estimated performance and P-value. A second and unexpected advantage to use of exact P-values concerns computational effort. When there is no missing biomarker data, the number of case and control data points is the same for each biomarker, and only one reference distribution must be calculated for the entire analysis. For example, exact P-values were calculated for each of the 3030 biomarkers in our simulated study using only Table 1. The computation involved to calculate exact P-values is therefore very fast once the reference table is created. In contrast, the logit-normal approximation P-values required calculation of standard errors for each biomarker separately, a process that was time consuming with use of bootstrap resampling.