TSV Input Format
The TSV input format parses tab-separated and space-separated values text files.
TSV text files, usually called "tabular" files, are generic text files containing
values separated by either spaces or tabs.
This it also the format of the output of many command-line tools. For example, the
output of the "netstat" tool is a series of lines, each line consisting
of values separated by spaces:
Active Connections Proto Local Address Foreign Address State TCP GABRIEGI-M:epmap GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:microsoft-ds GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:1025 GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:1036 GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:3389 GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:5000 GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:42510 GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING TCP GABRIEGI-M:netbios-ssn GABRIEGI-M.redmond.corp.microsoft.com:0 LISTENING UDP GABRIEGI-M:microsoft-ds *:* UDP GABRIEGI-M:isakmp *:* UDP GABRIEGI-M:1026 *:* UDP GABRIEGI-M:1027 *:* UDP GABRIEGI-M:1028 *:* UDP GABRIEGI-M:ntp *:* UDP GABRIEGI-M:1900 *:* UDP GABRIEGI-M:ntp *:* UDP GABRIEGI-M:netbios-ns *:* UDP GABRIEGI-M:netbios-dgm *:* UDP GABRIEGI-M:1900 *:* UDP GABRIEGI-M:42508 *:*
Depending on the application, the first line in a TSV file might be a "header",
containing the labels of the record fields.
The following example shows a TSV file beginning with a header:
Year PID Comment 2004 2956 Application started 2004 Waiting for input 2004 3104 Application started 2004 1048 Application started
Among all the parameters supported by the TSV input format, the iSeparator, nSep, and fixedSep parameters play a crucial role in providing the flexibility of the TSV input format on the format of the files being parsed.
The iSeparator parameter specifies the character
used as a separator between the fields in the files being parsed.
Some text files, like the previous netstat example, use simple space characters as
separator characters, while other text files, like the second example above, use tab
The nSep parameter specifies how many separator characters
must appear for the characters to signify a field separator.
In the netstat example above, fields are separated by at least two space characters, while
a single space character is allowed to appear in the value of a field (as is the case
with the "Local Address" field name).
On the other hand, in the previous tab-separated example file, fields are separated by a
single tab character.
The fixedSep parameter specifies whether or not
the fields in the input files are separated by a fixed number of separator characters.
In the netstat example above, fields are separated by at least two space characters,
but three or more space characters still signify a single field separator.
On the other hand, in the previous tab-separated example file, fields are separated by
exactly a single tab character, and the presence of two consecutive tab characters signifies
an empty field.
From-Entity Syntax
See also:
CSV Input FormatTSV Output Format