Handling of ROC curves


Receiving Operating Characteristic (ROC) curves are basically used in judgement of usefulness of diagnostic tests (in healthcare) or in wider sense in objective quantification of decision methods with two outcomes (like healthy or diseased in case of a diagnostic tool). We can commit two mistakes in a decision procedure: we misclassify a diseased person healthy or a healthy person to diseased.

We have to start with two definitions:

Sensitivity: the probability that a diagnostic test is positive for a disease when the patient really suffers from that specific disease.

Specificity: the probability that a diagnostic test is negative for a disease when a subject really does not have that specific disease.

The optimal diagnostic test would have 100% of both sensitivity and specificity. But this is not possible. Consequently, we try to optimise another measure, the false positive rate (FPR), which is defined as 1-specificity (generally plotted on the x-axis) versus the sensitivity (plotted on the vertical axis). A ROC curve starts at the (0,0) coordinate representing the case when all test results are negative and ends at (1,1) coordinate, representing that all test results are positive. /PIC 1/

What are the statistical challenges in management of ROC curves?

  •  Calculation of the area under the curve (AUC)
  •  Determination of the coordinates of the ROC curve at a specific point
  •  Determination of confidence intervals of specific coordinates or thresholds
  •  Calculation of the confidence intervals of sensitivities at given specificities and vice versa
  •  Testing of identification of two ROC curves
  •  Determination of covariance of two paired ROC curves
  •  Handling of multi class cases (e.g. those situations where are more than two outcomes)
  •  Visualisation of ROC curves, the previously calculated CI intervals
  •  Sample size determination for ROC curves
  •  Comparison of the AUC of two ROC curves
  •  Smooth of a ROC curve (sometimes the classification is based on a discrete scale (e.g. for a cancer diagnostic tool: “normal”, “benign”, “probably benign”, “suspicious”, “malignant”) and smoothing methods fit a continuous curve based on the assumptions of the background distribution).
  •  Management of partial AUC (restriction of a diagnostic test to a specific interval).


V1.1-3., Date: 19-Feb-2015

The description is a quotations from the R package manual.

This package enables users to compute one or more optimal cutpoints for diagnostic tests or continuous markers. Various approaches for selecting optimal cutoffs have been implemented, including methods based on cost-benefit analysis and diagnostic test accuracy measures (Sensitivity/Specificity, Predictive Values and Diagnostic Likelihood Ratios). Numerical and graphical output for all methods is easily obtained.


First, we need a sample database for demonstration purposes. The ELAS database is attached to the library. It contains 141 observations and 3 variables: ELAS, the observed measure, STATUS, presence or absence of coronary artery disease (0=absence, 1=presence) and GENDER (with male and female levels).

With the help of available functions (control.cutpoints optimal.cutpoints plot.optimal.cutpoints print.optimalcutpoints and summary.cutpoints) we can select the cut-point for which the classification is optimal according to the background hypothesis. In some cases an overall optimum should be achieved in other cases the negative or positive test results have priority in finding the optimal cut-point.

For example, a generally used index is the Youden-index.

Youden-index (Y): by definition it is the value c for which
Y = maximum(sensitivity (c) + specificity (c) – 1)

It other words this is the J is the maximum vertical distance between the ROC curve and the diagonal. J=1 means that the effectiveness of the test is relative large and J=0 means a limited effectiveness.

Let's see some examples:

# Youden Index Method ("Youden"): Covariate gender
optimal.cutpoint.Youden<-optimal.cutpoints(X = "elas", status = "status", tag.healthy = 0,
methods = "Youden", data = elas, pop.prev = NULL, categorical.cov =
"gender", control = control.cutpoints(), ci.fit = TRUE, conf.level = 0.95, trace = FALSE)

The outcome looks like:

Area under the ROC curve (AUC):  0.818 (0.684, 0.952)
Number of optimal cutoffs: 1
cutoff            46.0000000
Se                 0.6666667
Sp                 0.8181818
PPV                0.7142857
NPV                0.7826087
DLR.Positive       3.6666667
DLR.Negative       0.4074074
FP                 4.0000000
FN                 5.0000000
Optimal criterion  0.4848485

How can we obtain these values and how can we interpret them? Considering the female cases, the ROC-curve looks like this:

What does the output and the graph say? You can find some explanation for each value in the indicated paragraph.

1) Area under the curve

On the graph: the area under the solid line, which is estimated to 0.818 (the 95% confidence in the brackets: [0.684; 0.952]).

Cutoff: this is the outcome of the procedura (for which Youden-index is maximal) and the obtained value is 46.

2) Sensitivity and Specificity

SE (sensitvity) = 0.067 and SP (specificity) = 0.818 (considering the actual cut-off value). How can we derive these values? Let's introduce a new classifier - CL - according to the derived cut-point. It's value is "0" when ELAS < 46 and "1" when ELAS >= 46.

elas$cl <-0
 table(elas$status[elas$gender=="Female"], elas$cl[elas$gender=="Female"])

The table of the two classifiers, STATUS vs. CL:

cl               negative positive 
negative         18           4
positive          5          10

Consequently, SP = 18/(18+4) = 0.818 and SE = 10/(10+5) = 0.667. This value is also indicated on the graph (see the graph above), but please note that the x-axis is 1-specificity, so the noted point is (0.182; 0.667).

3) Positive and negative predictive values

PPV (positive predictive value): 0.714, because 10/14 = 0.714. Positive predictive value states the probability of a really positive status considering a positive diagnostic result.

NPV (negative predictive value): 0.783, because 18/23 = 0.783. Negative predictive value states the probability of a really negative status considering a negative diagnostis result.

4) Likelihood ratio tests

The outcome states that DLR.positive = 3.67 and DLR.negative = 0.41. What does it mean and how can we derive these results?

Specificity and sensitivity can be considered as pre-test probabilities. We know that the diagnostic tool gives a positive result with the probability expressed as the sensitivity in presence of the disease. A meaningful question is to interpret the difference (actually the rate) of the probability of a positive test result between a diseased and a healthy individual. Similarly, we can do this for the negative test results. The derived values are called the positive and negative likelihood ratios.

Using the general conditional probability definitions:

LR+ = probability of an infividual with the condition having a positive test / probability of an individual without the condition having a positive test.

LR- = probability of an infividual with the condition having a negative test / probability of an individual without the condition having a negative test.

In other words, by definition: LR+ (noted as DLR.positive) = sensitivity / (1-specificity) = 0.667 / (1-0.818) = 3.66, and LR- (noted as DLR.negative) = (1-sensitivity) / specificity = (1-0.667) / 0.818 = 0.41.

These values are expressed on an exponential scale, and people generally use the Fagan-nomogram to interpret the outcomes You can use a scale
(source https://en.wikipedia.org/wiki/Likelihood_ratios_in_diagnostic_testing) like this: values between 0 and 1 decreases the probability of the disease, namely, 0.1 = -45%; 0.2 = -30%; 0.5 = -15%; while values greater than 1 increases the probability of the disease, namely 2 = +15%; 5 = +30% and 10 = +45%.

Our example says that a positive test result is 15-18% more probable for a diseased individual and roughly 20% less probable in case of a healthy individual.

5) False positive and false negativ indicators

According to the the outcome FP = 4 and FN = 5. These are the number of incorrectly classified cases (status not equal to the classifier) and these figures can be found in the table directly.

6) Optimal criterion

In the output it is 0.485. This value - by definition - is the value of the criterion at the optimal point. As we used the Youden-index which is - again, by definition - specificity + sensitivity - 1 than 0.818 + 0.667 - 1 really gives 0.485).



[1] easyROC: An Interactive Web-tool for ROC Curve Analysis Using R Language Environment
Dincer Goksuluk, Selcuk Korkmaz, Gokmen Zararsiz and A. Ergun Karaagaoglu , The R Journal (2016) 8:2, pages 213-230.

[2] easyROC: a web-tool for ROC curve analysis (1.3), http://www.biosoft.hacettepe.edu.tr/easyROC/

[3] Fundamentals of Clinical Research for Radiologists ROC analysis. Nancy A. Obuchowski, American Journal of Roentgenology. 2005;184: 364-372. 10.2214/ajr.184.2.01840364

[4] ROCR: https://rocr.bioinf.mpi-sb.mpg.de

[5] Carey V and enhancements HRfCl (2017). ROC: utilities for ROC, with uarray focus. R package

Online materials

Online demonstration of ROC-curves and background distributions: http://www.navan.name/roc/

Youtube videos on

ROC curve

ROC curve and AUC

Sensitivity and specificity

Illustration the link between the background distributions, cut-points and AUC