Data_Preparation (09/08/11)

CRHM Platform

Observation File Preparation.

    Data observation files are sequential ASCII text files with a time field as the first field on every line followed by the observation data fields.  There is a line (record)  in the file for every data interval.  CRHM files are similar with the addition of a data header which defines the observation names,  the order in which the observations occur on every line and finally giving help information describing the file and the observations.

Observation file layout.

Header.

  1. The first line of the file can be used to describe the file.  It must be present and cannot be more than one line.
  2. Data definition.  Each line consists of a variable name, dimension of the variable, variable units in the CRHM format and lastly an optional comment.  The delimiter is one or more spaces.
  3. Filter definitions.  These always have a '$' in column one and must follow all data definitions.
  4. "#####" - flag indicating the end of the header.  Only 4 '#' symbols are required.  The remainder of the line is ignored.

    No extra lines are allowed, including null lines.  All lines must start in column one.

    Comment lines are allowed in the header after the first line. Use "//" or "$$" in column 1 and 2.  Comment lines are only allowed in the header portion of the file.

Data.

  • The data must always begin one interval into the day and end at midnight of the last day.
  • The time format can either be "yyyy mm dd hh mm" or MS decimal time with at least 5 decimal places, e.g. "27181.04167".
  • The delimiter can be space or tab.  The tab is preferable for data inspection since the columns can be made to line up.
  • No comments or null lines are permitted.
  • Any missing data must be filled with a numeric flag, e.g. "9999" which can be massaged using a filter.

UNIX files.

    UNIX observation files cause CRHM to fault.  These files should be copied and pasted into a new text file created by TextPad or other PC text editor.

Assembling Data.

    Excel® is a convenient program for assembling the field data files into one file covering the desired period.  It has the capability of importing and merging data files and cutting and pasting individual columns simplifying data editing.  The observations should be assembled in columns.  The first column should be time. Beginning from the first interval of the first day and should end at midnight of the last day of the period.

    The time field can be created using =DATEVALUE("02/17/03") + TIMEVALUE("0:30") to assign a time to a cell.  If two cells are defined in succession then the FILL/TREND function can be used to fill a 'blocked' column with successive times for the desired period.  The month/day/year order entering dates is determined by the regional settings of the PC.  A #VALUE! error or an unexpected date will occur when the incorrect order is used.

    It is recommended that at this stage all cells outside the required area of the spreadsheet be deleted otherwise they will also be exported from the spreadsheet together with the desired data causing needless problems.

    The time and date column must be formatted to display as decimal time.  Select the date and time column which normally would be column A.   Select Format cells... and then select Number.  Set the number of decimal places to 5.  The spreadsheet should display the date and time as a fractional number in the 37,000 range.

Exporting data from Excel.

    After the data is assembled, save the file as an Excel file for future use.  Select File/SaveAs.. then select the desired folder.   Click in the Save As Type: box and select Text(Tab delimited).   Enter the desired file name and select Save.  At this point it is best to close the file after saving the text file as the Excel prompts can be misleading.

Handling time in Excel.

    Quite often the format of time downloaded from data loggers is ugly and difficult to put into the proper format.  Often it is easier to generate the time column from scratch using an Excel functions. The method described above is best.   Another method is as follows.

  1. Insert the date function in a cell, i.e. date(year, month, day). Say in cell $K$4.
  2. Insert in the first row and the first column of the datalogger data  "$K$4 + 1.0/F" where $K$4 is the location of the cell holding the date function and F is the daily frequency of the data.  Say this is cell K5
  3. Repeat for the remainder of the column "$K5 + 1.0/F" in cell K6, "$K6 + 1.0/F" in cell K7, etc.
  4. Format the column as "Number" with 5 "Decimal places".
  5. This column should be exported as the first column in the CRHM observation file.

Creating CRHM obs file.

    The text editor,  TextPad is recommended for this step.   NotePad is limited as it is unable to display formatting characters like tabs.   TextPad is able to do column cuts and pastes.

    Open the *.txt file saved by Excel.  Turn on visible formatting by clicking the icon.  Check that there are no extra fields at the end of lines or at the end of the file.  If the columns are ragged increase the tab size in the Configure/Preferences.../Tab/Tab size menu.

    At the beginning of the file insert a comment line giving a the file description.  Next insert a line for each data field consisting of the name to be used by CRHM followed by the number of data items in the field.  This is usually one.   The remainder of the line can be used as a descriptive comment.  An example line is "t 1 air temperature at 1m.".  Field names cannot have embedded spaces.  One or more spaces separates the fields.  The comment can have embedded spaces.

    Every column of data after the time field has to be accounted for by the data definitions in the header section.  The end of the header is marked by a line containing four or more pound signs, e.g. "#####".

    If observation filters are used they are inserted between the last data definition and the line containing the pound signs.  Observation filters are modifiers which allow the input observation values to be modified before being used by CRHM.  Examples are changing units either using a dedicated function ("$Tc FtoC(Tf)") or using one or more arithmetical filters ("$a add(var, c)", "$a sub(var, c)", "$a mul(var, c)" and "$a add(div, c)").   Other uses are changing the reference height of measurements ("$u2 refwind(u, Z2, Zm, Ht)"), replacing error flagged data with good values ("$d missing(var, c1, c2)",  "$d missinginter(var, c1, c2)" and "$d missing0(var, c1, c2)") and for distributing daily precipitation over every interval of the day ("$p smear(p, 0, 0)" or "$var expand(var, c_freq)").

It is best to test the observation files separately in CRHM before loading them into a project.  A common problem is missing intervals,  i.e. if a buffer overflow has occurred with a data logger a block of data will be missing.  This will be indicated by CRHM detecting a sparse file.

 

Time Simulation (Can only be used in a separate *.obs file containing no real data).

    Sometimes it is convenient to generate synthetic data instead of field obervations.  A special filter makes this possible.

$Sim(StartTime, EndTime, Interval)  where

StartTime as mm/dd/yy or mm/dd/yyyy

EndTime as mm/dd/yy or mm/dd/yyyy

Interval in hours. Minimum 0.5

When this filter is put in an observation file,  no interval data is ever read but time periods from 'StartTime' to 'EndTime, are generated at the interval specified.  The time simulation filter requires the addition 0f filters which generate actual data values e.g. "const(C)".

Wave Synthesis (Can only be used in a separate *.obs file with $Sim(... ) and not in a regular observation file).

$Fract sine(period phase start end)

$Fract square(period phase start end)

$Fract ramp(period phase start end)

$Value exp(start end B)          Value = eB(t - t0)

$Value log(start end B)            Value = ln(B*(t - t0))

$Value pow(start end B)         Value = (t - t0)B

$Value poly(start end a0 a1 a2 a3 a4)      where X = (t - t0), Value = a0 + a1*X + a2*X2 + a3*X3 + a4*X4

$Fract pulse(start end), where

period is in days. For less than one day the time format can be used, e.g. 12 hours can be '0.5' or '12:00:00'.

phase is in days. For less than one day the time format can be used, e.g. 12 hours can be '0.5' or '12:00:00'.

start is start date (mm/dd/yy) or (mm/dd/yy_hh:mm:00)

end is end date (mm/dd/yy or (mm/dd/yy_hh:mm:00))

Fract will be a timeseries varying between -1.0 and 1.0 determined by the function.

Value is the timeseries calculated by the function.

Example of Wave Synthesis.

Simulation test (06/10/02)

$$ comment line - Test pulse generation

$Sim(01/01/01, 01/05/2001, 1) simulation time period

$Fract pulse(01/02/01, 01/03/01) one day pulse

$P pulse(01/01/01_1:0:0, 01/01/01_12:0:0)

$Sine sine(12:0:0, 0, 01/01/01, 01/02/01)

$Ramp ramp(1, 0, 01/01/01, 01/02/01)

$Square square(1, 0, 01/01/01, 01/02/01)

$$ comment line - delayed functions

$P1 pulse(01/03/01, 01/04/01)

$S1 sine(1, 0, 01/03/01, 01/04/01)

$R1 ramp(1, 0, 01/03/01, 01/04/01)

$Q1 square(1, 0, 01/03/01, 01/04/01)

############################################

Using TextPad to create observation files without using Excel.

    Since TextPad is able to cut and paste columns the only problem is adding the time field for CRHM into the file.  Fortunately, the time field can be generated by using  the filter "$Sim(01/01/01, 01/05/2001, 1)" and merging the time field column into the data file.