Calibration Table

Netica

Calibration & Times Surprised Tables

Calibration Table:  The next part of the Test Net with Cases report is a table titled "Calibration:".  It indicates whether the confidence expressed by the network is appropriate (i.e. "well calibrated").  For instance, if the network were forecasting the weather, you might want to know:  Of all the times it said 30% chance of rain, what percentage of times did it rain?  If there were lots of cases, the answer should be close to 30%.

For each state of the node there are a number of items separated by vertical bars (|).  Each item consists of a probability percentage range R, followed by a colon (:) and then a single percentage X.  It means that of all the times the belief for that state was within the range R, X percent of them the true value was that state.

For instance:

    rain   0-10:  8.5 |

means that of all the times the belief for rain was between 0 and 10%, 8.5% of those times it rained.  The reason that the probability ranges are uneven, and different from state to state, and run to run, is that they are chosen so that the X percentages are reasonably accurate.  The bin sizes have to adapt, or there might not be enough cases falling in that bin.  The more cases you process, the more fine will be the probability ranges.

Calibration results are often drawn as a graph (known as a "calibration curve") where ideal calibration is a straight diagonal line.  For more information, see a text which discusses probability "calibration" for example, Morgan&Henrion90,p.110.

Times Surprised Table:  Following the calibration table of the report is the "Times Suprised" table.  It is used to determine how often the network was quite confident in its beliefs, but was wrong.  There are columns for being 90% confident and 99% confident (i.e. beliefs are greater than 90% or 99% respectively), and also for being 90% and 99% confident that the value of the node will _not_ be a certain state (i.e. beliefs are less than 10% or 1% respectively).

The ratios indicate the number of times it was wrong out of the number of times it made such a confident prediction, and a percentage is also printed.  If the network is performing well these percentages will be low, but keep in mind that it is very reasonable to be wrong with a particular 10% or 90% prediction 10% of the time, and to be wrong with a particular 1% or 99%  prediction 1% of the time.  If the network rarely makes strong predictions (i.e. beliefs are rarely close to 0 or 1), then these most of these ratios will be 0/0.

Other sections of the Test Net with Cases Report:

Confusion Matrix & Errors

Scoring Rule Results

Quality of Test