Case File Format

Netica

Case File Format

Structure:  Case files (single-case or multi-case) are pure ASCII = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_text_file.htm');return false;">text files.  They may contain “// ~ >[CASE 1] >~” or a time-author stamp, somewhere in the first 3 lines, but that is not normally present.  Then comes a line consisting of headings for the columns.  Each heading corresponds to one variable of the = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_case.htm');return false;">case, and is the name of the node used to represent the variable (sometimes the variables are called attributes and the entries in the column values, i.e. attribute-value).  The headings are separated by spaces and/or tabs (it doesn’t matter how many).  There should be no spaces in the names of the nodes.

The case data is next, with one case per line (a single-case file only has one such line).  The values of the variables are in the same order as the heading line, and are separated by spaces or tabs (the columns don’t have to “line up” as they do in the examples below).

Discrete:  The value of a = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_discrete.htm');return false;">discrete variable is given by its state name, or by its state number preceded by a ‘#’ character (the first state is #0).  Using the state names is preferred, since the order of the states may be changed sometime, and that would render a file with state numbers invalid.  The ‘#’ symbol is recommended, but may be omitted if the node has no discretization or values defined.

Continuous:  The value of a = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_continuous.htm');return false;">continuous variable is given by a number in integer, decimal, or scientific notation (e.g. -3.21e-7).  If it has been = 4 && typeof(BSPSPopupOnMouseOver) == 'function') BSPSPopupOnMouseOver(event);" class="BSSCPopup" onclick="BSSCPopup('X_PU_discretize.htm');return false;">discretized, then the value may be given by a state name or state number instead, but the continuous number is preferred if it is available.  That way the case file can be used for different discretizations of that variable in the future.  It is best if the value has the correct number of significant figures, since future versions of Netica may use this information.

Missing:  If the values of some of the variables are unknown for some of the cases, then an asterisk * is put in the file instead of the value.  This is known as “missing data”.  When reading case files, Netica can also understand a question mark ? used for missing data.

Uncertain or Negative:  Negative, interval, Gaussian, set, etc findings can also be entered in a case file using the UVF format.

Comments:  There may be as many spaces or tabs at the end of a line as desired, and there may also be C / C++ / Java style comments (e.g. a double slash “//”, followed by any text).

IDnum:  There are two special columns that a file may have which don’t correspond to nodes.  One provides an identification number for each case, which must be an integer between 0 and 2 billion.  The heading for this column is “IDnum”.  Identification numbers do not have to be in order through the file.  The missing data symbol * must not appear in this column.

NumCases:  The other special column has the heading “NumCases”, and indicates the frequency or multiplicity of the case.  A multiplicity of m indicates m cases with the same variable values.  It is not required to be an integer, so it can be used to represent a frequency of occurrence if desired.  The missing data symbol * must not appear in this column either.

Examples:  Here is a listing of “Chest Clinic.cases”.  It involves only discrete nodes with state names, and has an IDnum column, but no frequency column.  Here is another example of a case file, this time for cars brought into a garage.  It has discrete and continuous variables, state numbers and state names, and asterisks for missing entries.

Future:  Future versions of Netica will support more advanced operations with cases, including a more efficient file representation, and a way of using Bayes nets as “indexing functions” to do the kind of lookup common in case-based reasoning.  However, the above described type of file format will always be supported as well.