Calibration and discrimination are frequently studied as performance measures, especially for binary outcomes and time-to-outcome data.

Calibration (or ‘reliability’) refers to the agreement of predicted and observed predictions, e.g. 70% predicted = 70% observed? Calibration can be studied on average, and more specifically for patients with low and high predictions. Hereto, regression techniques are useful where an intercept and slope can be estimated for the predictions (see next graph).

Discrimination refers to the ability to distinguish low from high risk patients. For binary outcomes, discrimination can be quantified by the area under the receiver operating characteristic (ROC) curve. This curve is constructed by plotting sensitivity against 1-specificity for different cut-off points for the predicted risk. The area indicates the probability that a patient with the outcome had a higher predicted probability than a patient without the outcome, for random pairs of patients with and without the outcome.