Parsing Input Incrementally
Log Parser is often used to parse logs that grow over time.
For example, the IIS logs and the Windows Event Log are continuously updated with new
information, and in some cases, we would like to parse these logs periodically and
only retrieve the new records that have been logged since the last time.
This is especially true for scenarios in which, for example, we use Log Parser to
consolidate logs to a database in an almost real-time fashion, or when we use Log Parser
to build a monitoring system that periodically scans logs for new entries of interest.
For these scenarios, Log Parser offers a feature that allows sequential executions of the
same query to only process new data that has been logged since the last execution.
This feature can be enabled with the iCheckPoint parameter of the following input
formats:
The "iCheckPoint" parameter is used to specify the name of a "checkpoint"
file that Log Parser uses to store and retrieve information about the "position" of the last
entry parsed from each of the logs that appear in a command.
When we execute a command with a checkpoint file for the first time (i.e. when the specified
checkpoint file does not exist), Log Parser executes the query normally and processes all the logs
in the command, saving for each the "position" of the last parsed entry to the checkpoint file.
If later on we execute the same command specifying the same checkpoint file, Log Parser
will parse again all the logs in the command, but each log will be parsed starting after the
entry that was last parsed by the previous command, thus producing records for new entries only.
When the new command execution is complete, the information in the checkpoint file is
updated with the new "position" of the last entry in each log.
Note: Checkpoint files are updated only when a query executes succesfully. If an error causes the execution of a query to abort, the checkpoint file is not updated.
To make an example, let's assume that the "MyLogs" folder contains the following
text files:
- Log1.txt, 50 lines
- Log2.txt, 100 lines
- Log3.txt, 20 lines
- Log4.txt, 30 lines
In order to parse these logs incrementally, we specify the name of a checkpoint file, making sure that the file does not exist prior to the command execution. Our command would look like this:
logparser "SELECT * FROM MyLogs\*.*" -i:TEXTLINE -iCheckPoint:myCheckPoint.lpcWhen this command is executed for the first time, Log Parser will return all the 200 lines from all of the four log files, and it will create the "myCheckPoint.lpc" checkpoint file containing the position of the last line in each of the four log files.
Tip: When the checkpoint file is specified without a path, Log Parser will create the checkpoint file in the folder currently set for the %TEMP% environment variable, usually "\Documents and Settings\<user name>\Local Settings\Temp".;
Let's now assume that the "Log3.txt" file is updated, and that ten new lines are added to the log file.At this moment, the log files and the information stored in the checkpoint file will look like this:
Log Files | Checkpoint file |
Log1.txt, 50 lines | Log1.txt, line 50 |
Log2.txt, 100 lines | Log2.txt, line 100 |
Log3.txt, 30 lines | Log3.txt, line 20 |
Log4.txt, 30 lines | Log4.txt, line 30 |
If now a new "Log5.txt" file is created containing ten lines, the log files and the information stored in the checkpoint file will look like this:
Log Files | Checkpoint file |
Log1.txt, 50 lines | Log1.txt, line 50 |
Log2.txt, 100 lines | Log2.txt, line 100 |
Log3.txt, 30 lines | Log3.txt, line 30 |
Log4.txt, 30 lines | Log4.txt, line 30 |
Log5.txt, 10 lines | not recorded |
As another example showing how the checkpoint file is updated, let's assume now that the "Log2.txt" file is deleted.
The log files and the information stored in the checkpoint file will now look like this:
Log Files | Checkpoint file |
Log1.txt, 50 lines | Log1.txt, line 50 |
non-existing | Log2.txt, line 100 |
Log3.txt, 30 lines | Log3.txt, line 30 |
Log4.txt, 30 lines | Log4.txt, line 30 |
Log5.txt, 10 lines | Log5.txt, line 10 |
Log Files | Checkpoint file |
Log1.txt, 50 lines | Log1.txt, line 50 |
Log3.txt, 30 lines | Log3.txt, line 30 |
Log4.txt, 30 lines | Log4.txt, line 30 |
Log5.txt, 10 lines | Log5.txt, line 10 |
As a last example, let's now assume that the "Log1.txt" file is updated, but this time its size shrinks and it ends up containing ten lines only.
The log files and the information stored in the checkpoint file will now look like this:
Log Files | Checkpoint file |
Log1.txt, 10 lines | Log1.txt, line 50 |
Log3.txt, 30 lines | Log3.txt, line 30 |
Log4.txt, 30 lines | Log4.txt, line 30 |
Log5.txt, 10 lines | Log5.txt, line 10 |
After the command execution is complete, the "myCheckPoint.lpc" checkpoint file is updated to reflect the new situation, and the log files and the information stored in the checkpoint file will look like this:
Log Files | Checkpoint file |
Log1.txt, 10 lines | Log1.txt, line 10 |
Log3.txt, 30 lines | Log3.txt, line 30 |
Log4.txt, 30 lines | Log4.txt, line 30 |
Log5.txt, 10 lines | Log5.txt, line 10 |
Incremental Parsing and Aggregated Data
It's important to note that the checkpoint file only records information about the files being parsed; it does not record information about the query being executed.In other words, when we execute a query multiple times on a set of growing files using a checkpoint file, each time the query results are calculated on the new entries only. This means that queries using aggregated data need to be handled carefully when used with checkpoint files.
As an example, consider again the four text files in the first scenario above, and the following command:
logparser "SELECT COUNT(*) AS Total FROM MyLogs\*.*" -i:TEXTLINE -iCheckPoint:myCheckPoint.lpcWhen the command is executed for the first time, the "Total" field in the output record returned by the query will be equal to 200, that is, the total number of lines in the four log files.
As in the first example, let's now assume that the "Log3.txt" file is updated, and that ten new lines are added to the log file.
When we execute the command again, the "Total" field in the output record returned by the query will be now equal to 10, the total number of new lines in the four log files, and not to 210, as one would expect from the total number of rows.
In cases where it is desirable to calculate aggregated data across multiple executions of the same query when using incremental parsing, a possible solution is to save the partial results of each query to temporary files, and then aggregate all the partial results with an additional step.
Using the example above, we could save the result of the first query ("200") to the "FirstResults.csv" file, and the result of the second query ("10") to the "LastResults.csv" file. The two files could then be consolidated into a single file with a command like this:
logparser "SELECT SUM(Total) FROM FirstResults.csv, LastResults.csv" -i:CSV