Uncertain Values in Case Files

Netica

Case Files with Uncertain Values – UVF Format

The case files discussed in previous pages have only had values that were completely certain (or completely missing).  But Netica can also create and read case files having values that are known with limited accuracy, or only known to within some likelihood.  In fact, Netica has a very elegant, practical and powerful way of expressing uncertain findings, called the UVF format.

When Netica reads in a case containing uncertain findings (for example, by choosing Cases Get Case), it will enter them in the Bayes net as likelihood findings, so any probabilistic inference, node absorption, sensitivity analysis, etc. will properly account for them.  Also, the operations on case files, such as learning from cases, test net with cases and process cases, will work properly on case files containing uncertain values.  When learning from such cases, some learning algorithms will work better than others.  For more information on that, and an example of working with case files having uncertain findings, see the learning algorithms page.

Below is a list of the different types of uncertain values, their syntax in the case file, and what they mean.  Each type of uncertain value can appear anywhere in a case file where a regular value normally would.  For example, a case file could be a regular = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_CSV_file.htm');return false;">CSV file, or = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_tab_delimited_file.htm');return false;">tab delimited text file, but with some of the values replaced with entries having the syntax described below.

Gaussian

Syntax:

m+-s           m and s are real numbers

Examples:

5+-2    3.27+-0.03     0+-1e-5

This is for a = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_normal_distribution.htm');return false;">Gaussian (also known as “normal”) likelihood finding, where the m is the mean and s is the standard deviation.  Note that there cannot be any space before or after the +-.  The uncertainties in measurements from lab instruments, or polling results, are often expressed with a ± notation, and indicate a Gaussian distribution, so they can now be easily input into Netica (although sometimes they may mean an interval distribution, as described below).

Interval

Syntax:

[a, b]    a and b are real numbers, state names or indexes preceded by #

Examples:

[0, 10]    [-3, 2.27]    [lo, med]    [#1, #3]

There may be spaces before or after the comma or brackets. Intervals of states include both endpoints, so [lo, med] includes states lo, med and any states between.  Intervals of numbers include the lower endpoint, but not the upper endpoint, so [0, 10] for variable X means 0 <= X < 10.  Likelihood within the interval is one; outside the interval it is zero.

Unbounded Interval

Syntax:

>m  or  <m    m is a real number, state name or state index preceded by #

Examples:

>4.75    <-10    <med    >#2

When m is a state, the interval includes the endpoint, and when it is a real number, the interval includes the endpoint only for > intervals (so > is really ³).  The interval can potentially extend to infinity, but in practice will probably be limited by known maximum or minimum values for the variable.  Likelihood within the interval is one; outside the interval it is zero.

Set of Possibilities

Syntax:

{s1, s2, … sn}   each si is a state name, state index preceded by #, interval,

 unbounded interval, or Gaussian.

 

Examples:

{lo, med}              {red, blue, green}   

{#1, #5, #7}           {[0,3.5], [4.5,10]}   

{[#35,#122], >#500}

There may be spaces before or after the comma or braces.  The value can be considered to be a disjunction of the elements (e.g. X=red or X=blue or X=green).  The likelihood of elements in the set is one; of those not in the set, it is zero.

Set of Impossibilities

Syntax:

~{s1, s2, … sn}  each si is a state name, state index preceded by #, interval or

 unbounded interval

 

Examples:

~{lo}                 ~{red, blue, green}      

~{#1, #5, #7}         ~{[0, 3.5]}

There may be spaces before or after the comma or braces, but not between the tilde (~) and the brace.  This is the same as "Set of Possibilities" except the "possible" states are those that are not listed, rather than those that are listed.  The likelihood of elements in the set is zero; of those not in the set, it is one.

A = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_negative_finding.htm');return false;">negative finding can be represented easily by just listing the state(s) eliminated by the observation.

Likelihood

Syntax:

{s1 p1, s2 p2, … sn pn}  each si is a state name, state index preceded by #,

 interval, unbounded interval, or Gaussian. Each pi is a

 number between 0 and 1.  Some pi may be absent.

 

Examples:

{female .8, male .3}       {3+-1 0.2, 7+-2 0.4}   

{[0,1.5] .5, [1.5,5] 0.1, [5,10] 0.02}

This is the same as a set of possibilities, but each possibility is weighted with a likelihood that appears after it (separated by a single space).  The most common kind of likelihood vectors are for discrete variables, where each state is listed, followed by its probability.  Any states that appear without a probability have a likelihood of 1, and any states that don't appear at all have a likelihood of 0.

Arbitrary likelihood distributions for continuous variables can be formed by a series of adjacent intervals, each with its own probability.  Or the elements can overlap, and then their likelihoods are combined.  For example {[0,10] .1, [2,4] .2} would be the combination of a rect function extending from 0 to 10 with height 0.1, and another rect from 2 to 4 with a height of 0.2.

Another useful distribution that is easy to form is the weighted combination of Gaussians.  For example {3+-1 0.2, 7+-2 0.4} is a bi-modal distribution with peaks at 3 and 7.

It is possible to mix weighted Gaussians, intervals, and discrete states within a single { ... } likelihood vector.

Negative Likelihood

Syntax:

~{s1 p1, s2 p2, … sn pn}  each si is a state name, state index preceded by #,

 interval, or unbounded interval. Each pi is a positive number.  

Some pi may be absent.

 

Examples:

~{red, green, teal .2, olive .8}   

~{[0,2] .4, [2,6] .2}

The same as a set of impossibilities, but each entry is weighted with a likelihood, which appears after it.  If no number appears after it, its likelihood is 0.  Entries that have numbers above 1 are indicated to be more probable than those not listed, and entries with numbers below 1 are less probable than the unlisted ones (unlisted entries have a likelihood of 1).

Complete Uncertainty

Syntax:

[i.e. the syntax is just an asterisk]

If nothing is known regarding the value of this variable (i.e. = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_missing_data.htm');return false;">missing data), then a question mark ? or an asterisk * should be used to indicate that.  It is equivalent to ~{} which is a likelihood of all ones.