Eliminating Duplicate Values
When working with information from logs, it is often desired to retrieve a list of some values where each element in the list appears only once, regardless of the number of times the same value appears in the original data.
As an example, consider the following query, which extracts all the domain accounts that
have logged on a computer from the "Security" event log:
SELECT RESOLVE_SID(Sid) AS Account FROM \\TESTMACHINE1\Security WHERE EventID IN (540; 528)The output of this query is a list of all the domain accounts appearing in each "Logon" event:
Account ------------------------------------------------ NT AUTHORITY\LOCAL SERVICE NT AUTHORITY\NETWORK SERVICE NT AUTHORITY\NETWORK SERVICE NT AUTHORITY\NETWORK SERVICE TESTDOMAIN\TESTUSER1 NT AUTHORITY\LOCAL SERVICE NT AUTHORITY\LOCAL SERVICE TESTDOMAIN\TESTUSER1 TESTDOMAIN\TESTUSER2 NT AUTHORITY\LOCAL SERVICE TESTDOMAIN\TESTUSER1If we are interested in retrieving a list in which each account name appears only once, we could use the DISTINCT keyword in the SELECT clause as follows:
SELECT DISTINCT RESOLVE_SID(Sid) AS Account FROM \\TESTMACHINE1\Security WHERE EventID IN (540; 528)And obtain:
Account ------------------------------------------------ NT AUTHORITY\LOCAL SERVICE NT AUTHORITY\NETWORK SERVICE TESTDOMAIN\TESTUSER1 TESTDOMAIN\TESTUSER2
The DISTINCT keyword is used to indicate that the output of a query should consist of unique records; duplicate output records are discarded.
As another example, we might want to retrieve a list of all the browsers used to request
pages from our IIS server, with each browser appearing only once in the list:
SELECT DISTINCT cs(User-Agent) FROM <1>
cs(User-Agent) -------------------------------------------------------------------- Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+98) Mozilla/4.05+[en] Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.0;+T312461;+Q312461) Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.0) Microsoft+Data+Access+Internet+Publishing+Provider+Cache+Manager Mozilla/2.0+(compatible;+MS+FrontPage+4.0) MSFrontPage/4.0 Microsoft+Data+Access+Internet+Publishing+Provider+DAV
It is also possible to use the DISTINCT keyword inside the COUNT aggregate function, in order to retrieve the total number of different values appearing in the data.
For example, the following query returns the total number of different browsers and
the total number of different client IP addresses that requested pages from our IIS server:
SELECT COUNT(DISTINCT cs(User-Agent)) AS Browsers, COUNT(DISTINCT c-ip) AS Clients FROM <1>
Browsers Clients -------- ------- 356 3379
Tip: In the Log Parser SQL-Like language, the DISTINCT keyword can be used inside aggregate functions only when the GROUP BY clause is not used.