The Dirfile Standards

Introduction

The Dirfile Standards describe the dirfile database format, a filesystem based database for time-ordered binary data. Dirfiles are designed to be a fast, simple format for storing and reading binary time-ordered data. This document provides an unofficial overview of the Dirfile Standards. The official Dirfile Standards are distributed with GetData as three Unix man pages: dirfile(5), dirfile-format(5), and dirfile-encoding(5). Additionally, this document discusses some implementation-dependent behaviour specific to GetData not found in those documents. The latest release of the Dirfile Standards is Standards Version 10 (January 2017).

The dirfile database is centred around one or more time-ordered data streams (a time stream). Each time stream is written to disk in a separate file, in its native binary format. The name of these time stream files correspond to the time stream's field name, a descriptive textual tag.

Two time streams may have different constant sampling frequencies and mechanisms exist within the dirfile format to ensure these time streams remain properly sequenced in time. To do this, the time streams in the dirfile are subdivided into frames. Each frame contains an integer number of samples of each time stream. When synchronous retrieval of data from more than one time stream is required, position in the dirfile can be specified in frames, which will ensure synchronicity.

The time stream files are all located in a central directory, known as the dirfile directory. The dirfile as a whole is typically referred to by its dirfile directory. Included in the dirfile along with the time streams is the dirfile format specification, which is an ASCII text file called format located in the dirfile directory.

Version 3 of the Dirfile Standards introduced the large dirfile extension. This extension added the ability to distribute the dirfile format specification among multiple files (called fragments) in addition to the format file, as well as the ability to house portions of the database in subdirfiles. These subdirfiles may be fully-fledged dirfiles in their own right, but may also be contained within a larger, parent dirfile.

In addition to the raw fields on disk, the dirfile format specification may also specify derived fields which are calculated from one or more raw or derived time streams. Derived fields behave identically to raw fields when read via GetData. See below for a complete list of derived field types.

Dirfiles are designed to be written to and read simultaneously. The dirfile specification dictates that one particular raw field (specified either explicitly with the /REFERENCE directive or implicitly by the format file) is to be used as the reference field: all other vector fields are assumed to have at least as many frames as the reference field has, and the size (in frames) of the reference field is used as the size of the dirfile as a whole.

Version 6 of the Dirfile Standards added the ability to encode the binary files on disk. Each fragment may have its own encoding scheme. Notably this can be used to compress these files. See Dirfile Encodings for information on encoding schemes.

Dirfile Example

An example dirfile is presented as Figure 1 below. The dirfile as a whole is referenced by the name of the directory which forms its base, in this case "dirfile". The base directory contains a format file which contains the metadata for the dirfile database. This format file includes other files (called format file fragments) which provide additional database metadata, including a format file in a subdirectory (a subdirfile). A full description of the format file syntax is given below.

Figure 1: Graphical representation of a dirfile

This dirfile contains four time streams (indicated by RAW in the format file, and often called raw fields), one of which is located in the subdirfile. Five derived fields are also defined (indicated by LINCOM, MULTIPLY, and BIT in the format file—other derived field types also exist, see below).

Each time stream has a corresponding file of binary data containing the time stream itself. Each time stream may have a different sample rate, also indicated in the format file. In this example, for every sample of field1, there are four samples of field2, twelve samples of field3, and eight samples of bits. Derived fields inherit the sample rate of their first input field, so, for example, the derived field diff has the same sample rate as field2.

The binary file associated with the time stream bits is located in the subdirfile because it is defined in the format file fragment in that directory. This is a general rule: the binary file associated with a time stream must reside in the directory which contains the fragment that defines the field. (This is true even if the fragment isn't called format, i.e. binary files associated with raw fields defined in extra_format would have to reside in dirfile.) If this rule is not followed, GetData will be unable to locate the time stream on disk.

The subdirfile is a fully formed dirfile in its own right, since its metadata is fully specified by its format file, which does not refer to any fields defined by its parent. Note, however, that dirfile, is not a complete dirfile without the inclusion of the subdirfile, since the format fragment extra_format refers to a field defined in the subdirfile.

The Format File

The format file is a case-sensitive text file which contains the dirfile database metadata. The explicit text encoding is not specified by the Standards, but it must be 7-bit ASCII compatible. Examples of acceptable character encodings include all the ISO 8859 character sets (i.e. Latin-1 through Latin-10, among others), as well as the UTF-8 encoding of Unicode and UCS.

The format file is composed of directive lines and field specification lines, optionally separated by blank lines, or lines containing only whitespace. Lines are separated by the line-feed character (0x0A). Unless escaped (see below), the hash mark (#) is the comment delimiter; the comment delimiter, and any text following it to the end of the line, is ignored.

Tokens

Both directive lines and field specification line consist of several tokens separated by whitespace. Whitespace consists of one or more whitespace characters. These are: space (0x20), horizontal tab (0x09), vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The first token of a directive line is always a reserved word, while a field specification line begins with a field name. As a result, no field may have the same name as a reserved word (although, as of Standards Version 8, all reserved words contain a forward slash character, /, which are prohibited in field names in any case).

Since tokens are separated by whitespace, to include a whitespace character in a token, it must either be escaped by preceding it by a backslash character (\), or replaced by a character escape sequence, see Table 1, below), or else the token must be enclosed in quotation marks ("). The quotation marks themselves are stripped from the token. The null-token (that is, the token consisting of zero characters) may be specified by a pair of quotation marks with nothing between them (""). To include a literal quotation mark or backslash character in a token, it must be escaped (\" or \\). Similarly, a hash mark may be included in a token by including it in a quoted token or else by escaping it (\#), otherwise the hash mark will be understood as the comment delimiter.

It is a syntax error to have a line which contains unmatched quotation marks, or in which the last character is an un-escaped backslash (i.e., line continuation is not allowed).

Several characters when escaped by a preceding backslash character are interpreted as special characters in tokens. Some of these have already been mentioned. The full list of character escape sequences is presented in Table 1. Any other character which is escaped is interpreted as the character itself. (i.e. \c is interpreted as c).

**Table 1:** Character Escape Sequences
Sequence	Interpretation	Byte
\"	a quotation mark character	0x22
\#	a hash mark character	0x23
\a	an alert (bell) character	0x07
\b	a backspace character	0x08
\e	an escape character	0x1B
\f	a form-feed character	0x0C
\n	a line-feed character	0x0A
\r	a carriage return character	0x0D
\t	a horizontal tab character	0x09
\v	a vertical tab character	0x0B
\\	a backslash character	0x5C
\ooo	the single byte given by the octal number ooo. (1 to 3 octal digits)	0ooo
\xhh	the single byte given by the hexadecimal number hh. (1 or 2 hexadecimal digits)	0xhh
\uhhhhhhh	the UTF-8 byte sequence encoding the Unicode code point given by the hexadecimal number hhhhhhh. (1 to 7 hexadecimal digits)

No token may contain the NUL character (0x00). Furthermore, although support is present to create UTF-8 byte sequences, tokens are not required to be valid UTF-8 sequences. Any byte sequence not containing the NULL character forms a valid token. However, there may be further restrictions on allowed characters for a token in a particular situation, (for example, when used as a field name).

Standards Versions 5 and earlier do not recognise the character escape sequences, nor allow quoting of tokens. As a result, they prohibit both whitespace and the comment delimiter from being used in tokens.

Directives

There are eleven directives, each specified by a different reserved word which cannot be used as field names in the dirfile; all directives are optional. As of Standards Version 8, all reserved words start with an initial forward slash (/), to distinguish them from field names. Standards Versions 5, 6 and 7 permit any reserved word to optionally omit its initial forward slash, without change in meaning. Reserved words in Standards Version 4 and earlier may not have an initial forward slash. Like the rest of the format specification, directives are case sensitive.

A number of the directives have fragment scope. A directive with fragment scope only applies to the fragment in which it is present, plus any sub-fragments indicated by the /INCLUDE directive, but only if those sub-fragments don't have their own corresponding directive. Directives which have fragment scope are: /ENCODING, /ENDIAN, /FRAMEOFFSET, and /PROTECT. Because of these scoping rules, different portions of the dirfile may have different encodings, endiannesses, frame offsets, or protection levels.

If a directive with fragment scope appears more than once in a fragment, only the last such directive is be honoured, with the exception that the effect of a directive is not propagated to sub-fragments if the directive line appears after the sub-fragment is included. The scoping rules of the remaining directives are discussed below.

/ALIAS: The /ALIAS directive defines an alternate name for a field defined elsewhere in the format specification (called the target). Aliases may not be used as the parent field in a /META directive, but are in most other ways indistinguishable from the target's original, canonical name. Aliases may be chained (that is, the target name appearing in an /ALIAS directive may itself be an alias). In this case, the new alias is another name for the target's own target. Just as there is no requirement that the input fields of a derived field exist, it is not an error for the target of an alias to not exist. Syntax is:
/ALIAS <name> <target>
A metafield alias may defined using the <parent-field>/<alias-name> syntax for name in the /ALIAS directive. No restriction is placed on target; specifically, a metafield alias may target a top-level field, or a metafield of with a different parent; conversely, a top-level alias may target a metafield.
A metafield alias may never appear as the parent part of a metafield field code, even if it refers to a top-level field. That is, given the valid format:
aaaa RAW UINT8 1
aaaa/bbbb CONST FLOAT64 0.0
cccc RAW UINT8 1
/ALIAS cccc/dddd aaaa
the metafield aaaa/bbbb may not be referred to as cccc/dddd/bbbb, even though cccc/dddd is a valid field code referring to aaaa.
This is not true of top-level aliases: if eeee is an alias of ffff, then ffff/gggg, a metafield of ffff, may be referred to as eeee/gggg.
The /ALIAS directive has no scope: it is processed immediately. It appeared in Standards Version 9.
/ENCODING: The /ENCODING directive specifies the encoding scheme used to encode binary files in the dirfile. Syntax is:
/ENCODING <scheme> [enc-datum]

The encoding scheme may be one of the pre-defined names listed below, which are described in more detail on the Dirfile Encodings page, or any other site-specific encoding scheme. The pre-defined schemes are:
- none: The dirfile is unencoded.
- bzip2: The dirfile is bzip2 encoded using the bzip2 compression library.
- flac: The dirfile is flac encoded using the libFLAC compression library.
- gzip: The dirfile is gzip encoded using the zlib compression library.
- lzma: The dirfile is lzma encoded using the liblzma compression library.
- sie: The dirfile is sample-index encoded (a variant of run-length encoding).
- slim: The dirfile is slim encoded using the slimlib compression library.
- text: The dirfile is text encoded.
- zzip: The dirfile is zzip encoded using the zzip compression library.
- zzslim: The dirfile is zzslim encoded using a combination of the slim and zzip compression libraries.
The enc-datum token provides additional data for the zzip and zzslim encoding schemes; see the Dirfile Encodings page for details.
GetData will accept dirfiles it encounters with an encoding scheme not listed here but, in this case, no binary file I/O (affecting gd_getdata(), gd_putdata(), gd_nframes(), &c.) is possible on the dirfile. If no encoding scheme is specified, GetData will try to determine the encoding scheme automatically, or in the case of a newly created dirfile, assume the dirfile is unencoded.
The /ENCODING directive has fragment scope. Introduced in Standards Version 6. The predefined schemes sie, zzip, and zzslim and the optional enc-datum token, appeared in Standards Version 9; the predefined scheme lzma appeared in Standards Version 7; all other predefined schemes appeared in Standards Version 6.
/ENDIAN: The /ENDIAN directive specifies the endianness of the raw data in the database. Syntax is:
/ENDIAN ( big | little ) [ arm ]
where big, and little specify the byte-ordering the data, and arm is specified when double-precision floating point data are in the middle-endian format used by older ARM processors.
GetData will assume that any dirfile that omits this directive is in the native endianness of the architecture on which it is running (i.e. no endianness swapping will take place.) Otherwise, if the endianness of the dirfile is different than the endianness of the architecture, GetData will perform endianness conversion while reading from and writing to raw data on disk. The /ENDIAN directive has fragment scope. Introduced in Standards Version 5; Standards Version 8 added the optional arm token.
/FRAMEOFFSET: The /FRAMEOFFSET directive specifies the frame number of the first frame in the database (the beginning-of-field marker). Syntax is:
/FRAMEOFFSET <integer>
Requests for data before the beginning-of-field marker will result in zero (for integer return types) or IEEE not-a-number (for floating point types). Attempts to write to the data before the beginning-of-field marker will result in an error, and no data written. The /FRAMEOFFSET directive has fragment scope. Introduced in Standards Version 1.
/HIDDEN: The /HIDDEN directive indicates that the specified field name is hidden. The Standards themselves don't specify what this means, but by default GetData suppresses hidden field names in the output of the field counting and field listing functions. Hiddenness is not inherited by metafields of the specified field. Hiddenness applies to the name, not the field itself; it does not hide all aliases of the field name, and if the field name an alias, the alias is hidden, not its target. Syntax is:
/HIDDEN <fieldname>
A /HIDDEN directive must appear after the specification of fieldname, (which occurs either in a field specification line, or an /ALIAS directive, or a /META directive) in the same fragment.
The /HIDDEN directive has no scope: it is processed immediately. It appeared in Standards Version 9.
/INCLUDE: The /INCLUDE directive specifies another file to parse for additional metadata for the dirfile. The inclusion is processed immediately, before the fragment containing the /INCLUDE directive (the parent fragment) is parsed further. RAW fields specified in the included fragment are located in the directory containing the fragment file, and not in the directory containing the parent fragment, and the binary file encoding may be different for each fragment. The fragment may be specified either with an absolute path, or else a path relative to the directory containing the parent fragment.
The /INCLUDE directive may optionally specify a prefix and/or suffix to apply to field names defined in the included fragment. If present, affixes are applied to all fieldnames (including aliases) defined in the included fragment and any fragments it further includes. Affixes nest, with the affixes of the deepest inclusion innermost. Affixes are not applied to the names of binary files associated with RAW fields. Syntax is:
/INCLUDE <file> [<namespace>.][<prefix>] [<suffix>]

To specify only suffix, the null token ("") may be used as prefix. A namespace may also be specified in an /INCLUDE directive by prepending it to prefix. The namespace and prefix are separated by a dot (.). The dot is required whenever a namespace is specified: if the prefix is empty, the third token should be just the namespace followed by a trailing dot. If a namespace is specified, that namespace, relative to the including fragment's root namespace, becomes the root namespace of the included fragment. If no namespace is specified in the /INCLUDE directive, then the current namespace (specified by a previous /NAMESPACE directive) is used as the root namespace of the included fragment. That is, if the current namespace is current_space, then the statement:
/INCLUDE file newspace.
is equivalent to
/NAMESPACE newspace
/INCLUDE file
/NAMESPACE current_space
As a result, if no namespace is provided, and there has been no previous /NAMESPACE directive, the included fragment will have the same root namespace as the including fragment.
The /INCLUDE directive has no scope: it is processed immediately. It appeared in Standards Version 3. The optional prefix and suffix appeared in Standards Version 9. The optional namespace appeared in Standards Version 10.
/META: The /META directive specifies a metafield attached to to a particular parent field. The field metadata may be of any allowed type except RAW. Metafields are retrieved in exactly the same way as regular field data, but the field code specified consists of the parent and metafield names joined with a forward slash:
<parent-field>/<meta-field>
/META field directives may not be specified before the parent field has been. Syntax is:
/META <parent-field> {field specification line}
The parent field code may not be an alias. As an illustration of this concept,
/META parent meta CONST FLOAT64 3.291882
provides a scalar metadatum called meta with value 3.291882 attached to the field parent. This particular metafield may be referred to by the field code "parent/meta". Note that different parent fields may have metafields with the same name, since all references to metafields must include the parent field name. Metafields may not themselves have further sub-metafields. The /META directive has no scope: it is processed immediately and has no long-term effect. /META directives are required to appear after their parent's specification, and in the same fragment.
For simplicity, starting with Standards Version 7, the above metafield can also be specified as:
parent/meta CONST FLOAT64 3.291882
making it look like a regular field specification line. Introduced in Standards Version 6.
/NAMESPACE: The /NAMESPACE directive changes the "current namespace for subsequent field specification lines. Syntax is:
/NAMESPACE <subspace>
The subspace specified is relative to the current fragment's root namespace. If subspace is the null-token ("") the current namespace will be set back to the root namespace. Otherwise, the current namespace will be changed to the concatenation of the root namespace with subspace, with the two parts separated by a dot:
rootspace.subspace
If rootspace is empty, the intervening dot is omitted, and the current namespace is simply
subspace
By default, all field codes, both field names for newly specified fields, and field codes used as inputs to fields or targets for aliases, are placed in the current namespace, unless they start with an initial dot, in which case the current namespace is ignored, and they're placed instead in the fragment's root namespace. See the Namespaces section for further details.
The /NAMESPACE directive has no scope: it is processed immediately. For the effects of changing the current namespace on included fragments, see the /INCLUDE directive above. The effects of a /NAMESPACE directive never propagate upwards to parent fragments. It appeared in Standards Version 10.
/PROTECT: The /PROTECT directive specifies the advisory protection level of the current fragment and of the RAW fields defined therein. The protection level indicates whether writing to the format file fragment, or the binary data on disk is permitted. Syntax is:
/PROTECT <level>
Four advisory protection levels are defined:
- none: No protection at all: data and metadata may be freely changed. This is the default, if no /PROTECT directive is present.
- format: The dirfile metadata is protected from change, but RAW data on disk may be modified.
- data: The RAW data on disk is protected from change, but metadata may be modified.
- all: Both metadata and data on disk are protected from change.
The /PROTECT directive has fragment scope. Introduced in Standards Version 6.
/REFERENCE: The /REFERENCE directive specifies the name of the field to use as the dirfile's reference field. If no /REFERENCE directive is specified, the first RAW field encountered is used as the reference field. The /REFERENCE directive must specify a RAW field. Syntax is:
/REFERENCE <field-code>
The /REFERENCE directive has global scope: if multiple /REFERENCE directives appear in the dirfile metadata, only the last such is honoured. Introduced in Standards Version 6.
/VERSION: The /VERSION directive specifies the particular version of the Dirfile Standards to which the dirfile conforms. Syntax is:
/VERSION <integer>
When a /VERSION directive indicates a Standards Version greater than GetData is prepared to deal with, it triggers permissive mode. In permissive mode, unrecognised lines (which it assumes are valid syntax in a newer version of the Standards) are silently ignored.
The /VERSION directive has immediate scope: its effect is immediate, and it applies only to metadata below it, including and propagating downwards to sub-fragments after the directive.
In Standards Version 8 and earlier, its effect also propagates upwards back to the parent fragment, and affects subsequent metadata. Starting with Standards Version 9, this no longer happens. As a result, a /VERSION directive which indicates a version of 9 or later never propagates upwards; additionally, /VERSION directives found in subfragments included in a Version 9 or later fragment aren't propagated upwards into that fragment, regardless of the Version of the subfragments. The /VERSION directive appeared in Standards Version 5.

Field Specifications

Any line which does not start with a reserved word is assumed to be a field specification line. A field specification line consists of at least two tokens. The first token is the field name. The second token is the field type. Subsequent tokens are field parameters. The meaning and number these parameters depends on the field type specified.

Field Names

A field name consists of one or more characters, excluding both ASCII control characters (bytes 0x00 through 0x1F) and the reserved characters listed in Table 2 according to Standards Version. Furthermore, the field name of a RAW field may only contain characters allowed in filenames^†. Although never allowed in a field name, a forward slash (/) can be used to define metafields; see above under the /META directive. Like the rest of the format file, field names are case sensitive.

**Table 2:** Reserved Characters in Field Names
Version	Reserved Characters
0–4	#^‡ / whitespace^‡
5	#^‡ / & ; < > \| whitespace^‡
6—	/ & ; < > \| .

‡: By virtue of there being no way to include such characters in tokens.

The field name may not be INDEX, which is a special, implicit field which contains the integer frame index. Standards Version 5 and earlier also prohibit FILEFRAM as a field name; it was an alias for INDEX, (which arose in the prehistoric times of ReadData, GetData's spiritual predecessor).

Standards Version 3 and 4 restrict field names to 50 characters. Standards Version 2 and earlier restrict field names to 16 characters. Additionally, the filesystem will put restrictions on the length of a RAW field name, regardless of Standards Version^*.

Starting in Standards Version 7, if the field name beginning a field specification line contains exactly one forward slash character (/), the line is assumed to specify a metafield. See the /META directive above for further details. A field name may not contain more than one forward slash. Starting in Standards Version 10, any field name may be preceded by a namespace tag. The namespace tag and the field name are separated by a dot (.). See the Namespaces section, following, for details.

†: Consult the documentation of the filesystem backing the database for details, although most modern filesystems permit any byte except NUL (0x00) or, failing that, any Unicode character except NUL.

*: Again, consult your filesystem documentation, but most modern filesystems permit filenames of at least 255 bytes.

Namespaces

Beginning with Standards Version 10, every field in a Dirfile is contained in a namespace. Every namespace is identified by a namespace tag which consist of the same restricted set of characters used for field names (see Table 2, above). Namespaces nest arbitrarily deep. Subnamespaces are identified by concatenating all namespace tags, separating tags by dots (.), with the outermost namespace leftmost:

topspace.subspace.subsusbspace

Each fragment has an immutable root namespace The root namespace of the primary format file is the null namespace, identified by the null-token (""). The root namespace of other fragments is specified when they are introduced (see the /INCLUDE directive). Each fragment also has a current namespace which may be changed as often as needed using the /NAMESPACE directive, and defaults to the root namespace. The current namespace is always either the root namespace or else a subspace under the root namespace.

If a field name or field code starts with a leading dot, then that name or code is taken to be relative to the fragment's root space. If it does not start with a dot, it is taken to be relative to the current namespace.

For example, if the both the root namespace and current namespace of a fragment start off as rootspace, then:

aaaa RAW UINT8 1
.bbbb RAW UINT8 1
cccc.dddd RAW UINT8 1
.eeee.ffff RAW UINT8 1

/NAMESPACE newspace

gggg RAW UINT8 1
.hhhh RAW UINT8 1
iiii.jjjj RAW UINT8 1
.kkkk.llll RAW UINT8 1

specifies, respectively, the fields:

rootspace.aaaa,
rootspace.bbbb,
rootspace.cccc.dddd,
rootspace.eeee.ffff,
rootspace.newspace.gggg,
rootspace.hhhh,
rootspace.newspace.iiii.jjjj, and
rootspace.kkkk.llll.

Note that a field code may specify deeper subspaces under either the root namespace or the current namespace (meaning it is never necessary to use the /NAMESPACE directive). Note also that there is no way for metadata in a given fragment to refer to fields outside the fragment's root space.

There is one exception to this namespace scoping rule: the implicit INDEX vector is always in the null (top-level) namespace, and namespace tags specified with it, either explicitly or implicitly, even a fragment root namespace, are ignored. So, in a fragment with root namespace rootspace, and current namespace rootspace.subspace,

INDEX,
.INDEX,
namespace.INDEX, and
.namespace.INDEX

all refer to the same INDEX field.

Field Types

There are eighteen field types. Of these, fourteen are of vector type (BIT, DIVIDE, INDIR, LINCOM, LINTERP, MPLEX, MULTIPLY, PHASE, POLYNOM, RAW, RECIP, and SBIT, SINDIR, WINDOW) and four are of scalar type (CARRAY, CONST, SARRAY, and STRING). The eleven vector field types other than RAW fields are also called derived fields, since they derive their value from one or more input fields.

Five of these derived fields (DIVIDE, LINCOM, MPLEX, MULTIPLY, and WINDOW) may have more than one vector input field. In situations where these input fields have differing sample rates, the sample rate of the derived field is the same as the sample rate of the first (left-most) input field specified. Furthermore, the input fields are synchronised by aligning them on frame boundaries, assuming equally-spaced sampling throughout a frame, and using the last sample of each input field which did not occur after the sample of the derived field being computed. That is, if the first and second input fields have sample rates s₁ and s₂, the derived field also has sample rate s₁ and, for every sample of the derived field, n, the n'th sample of the first field is used (since they have the same sample rate by definition), and the sample number used of the second field, m, is computed as:

$m = floor((n * s2) / s1)$ .

Starting in Standards Version 6, certain scalar field parameters in the field specifications may be specified using CONST or CARRAY fields, instead of literal values. A list of parameters for which this is allowed is given below in the Field Parameters section.

The possible fields types are:

BIT: The BIT field type extracts one or more bits out of an input vector field, treating the result as unsigned. Syntax is:
<fieldname> BIT <input> <first-bit> [ <num-bits> ]
which specifies <fieldname> to be the <num-bits>-bit long unsigned integer starting at bit <first-bit> (counting from the least-significant bit, which is numbered zero) of <input> after <input> has been converted from its native type to an endianness-corrected unsigned 64-bit integer. If <num-bits> is omitted, it is assumed to be one. Standards Version 0 doesn't recognise the <num-bits> token.
CARRAY: The CARRAY scalar field type is an list of constants fully specified in the format file metadata. Syntax is:
<fieldname> CARRAY <type> <value₀> <value₁> <value₂> ...
where type may be any supported native data type and value_n is the value of the n^th element of the scalar list, interpreted as indicated by type. GetData is prepared to deal with at least 2²⁴ elements. Note: despite being multivalued, this is not considered a vector field since the elements of the CARRAY are not indexed by frames. A CARRAY with a single element is identical to a CONST field. Introduced in Standards Version 8.
CONST: The CONST scalar field type is a constant fully specified in the format file metadata. Syntax is:
<fieldname> CONST <type> <value>
where type may be any supported native data type and value is the numerical value of the constant interpreted as indicated by type. Introduced in Standards Version 6.
DIVIDE: The DIVIDE field type is the quotient of two vector fields. Syntax is:
<fieldname> DIVIDE <field₁> <field₂>
where <fieldname> is computed as:
fieldname = field₁ / field₂
. Introduced in Standards Version 8.
INDIR: The INDIR vector field type performs an indirect translation of a CARRAY scalar field to a derived vector field based on a vector index field. Syntax is:
<fieldname> INDIR <index> <array>
where index is the vector field, which is converted to an integer type, if necessary, and array is the CARRAY field. The n^th sample of the INDIR field is the value of the m^th element of array (counting from zero), where m is the value of the n^th sample of index. When n is not a valid element number of array, the corresponding value of the INDIR is implementation dependent. INDIR appeared in Standards Version 10.
LINCOM: The LINCOM field type is the linear combination of one, two or three input vector fields. Syntax is:
<fieldname> LINCOM [<n>] <field₁> <m₁> <b₁> [ <field₂> <m₂> <b₂> [ <field₃> <m₃> <b₃> ]]
where: <n> indicates the number of input fields (i.e. 1, 2, or 3). In Standards Version 7 and 8 it is optional (unless <field₁> could be mistaken for a number, in which case it is mandatory, to prevent ambiguity); earlier Standards Versions require it. If omitted, the number of input fields is determined by the number of tokens present.
<fieldname> is computed as:
fieldname = (m₁ field₁ + b₁) + (m₂ field₂ + b₂) + (m₃ field₃ + b₃).
with <field₂>, <m₂>, <b₂>, and <field₃>, <m₃>, <b₃> included only if specified.
LINTERP: The LINTERP field type specifies a table look up based on another vector field. Syntax is:
<fieldname> LINTERP <input> <table>
where:
- <input> is the input field for the table lookup
- <table> is the path to the lookup table file for the field. The lookup table file is a text file with two whitespace separated columns of x and y values. Values are linearly interpolated between the points specified in the lookup table.
MPLEX: The MPLEX vector field type permits the multiplexing of several low sample rate fields into a single data field of higher sample rate. Syntax is:
<fieldname> MPLEX <input> <index> <count> [<period>]
where
- <input> is the input vector containing the multiplexed fields,
- <index> is the vector containing the mutliplex index,
- <count> is the value of the multiplex index when the computed field is stored in <input>, and
- <period> is the nominal period (in samples) between successive occurances of <count> in the index field. If not given, or zero, the period is assumed to be unknown or non-constant. If given, it must be non-negative.
At every sample n, the derived field is computed as:
$fieldname[n] = (index == count) ? input[n] : fieldname[n - 1]$
with m computed as described above.
The <index> vector is converted to an integer type for comparison. The value of the derived field before the first sample where <index> equals <count> is poorly defined. Typically GetData searches backwards for its value, but see the gd_mplex_lookback() function.
The values of <count> and <period> place no restrictions on values contained in <index>. Specifically, particular values of <index> (including <count>) need not be equally spaced (neither by <period> nor any other spacing); <index> need not ever take on the value <count> (in which case GetData will return 0 or NaN for the entirety of the output vector). MPLEX appeared in Standards Version 9.
MULTIPLY: The MULTIPLY field type is the product of two vector fields. Syntax is:
<fieldname> MULTIPLY <field₁> <field₂>
where <fieldname> is computed as:
fieldname = field₁ × field₂
Introduced in Standards Version 2.
PHASE: The PHASE field type shifts an input vector field by a specified number of samples. Syntax is:
<fieldname> PHASE <input> <shift>
where:
- <input> is the input field
- <shift> is the shift, in frames. A positive shift indicates a shift forward in time (towards larger frame numbers).
Introduced in Standards Version 4.
POLYNOM: The POLYNOM field type specifies a polynomial function of a single input field. Syntax is:
<fieldname> POLYNOM <input> <a₀> <a₁> [ <a₂> [ <a₃> [ <a₄> [ <a₅> ]]]]
where:
- <input> is the input field code
- <fieldname> is computed as:
  fieldname = a₀ + a₁ input + a₂ input² + a₃ input³ + a₄ input⁴ + a₅ input⁵
  with the higher-order terms computed only if the corresponding co-efficients a_i are specified.
Introduced in Standards Version 7.
RAW: The RAW field type specifies time streams on disk. In this case, the field name must correspond to the name of the file containing the time stream. Syntax is:
<fieldname> RAW <type> <sample-rate>
where:
- <sample-rate> is the integer number of samples per dirfile frame for the time stream, which must be at least one
- <type> is a token specifying the native data type:
  - UINT8: unsigned 8-bit integer
  - INT8: signed 8-bit integer
  - UINT16: unsigned 16-bit integer
  - INT16: signed 16-bit integer
  - UINT32: unsigned 32-bit integer
  - INT32: signed 32-bit integer
  - UINT64: unsigned 64-bit integer
  - INT64: signed 64-bit integer
  - FLOAT32: IEEE-754 standard 32-bit (single precision) floating point number
  - FLOAT64: IEEE-754 standard 64-bit (double precision) floating point number
  - COMPLEX64: C99-compatible single precision complex number (Standards Version 7 and later)
  - COMPLEX128: C99-compatible double precision complex number (Standards Version 7 and later)
  Signed integer data are two's complement. Complex data in the Fortran storage format (which is also specified by C99 § 6.2.5.13): two consecutive floating-point numbers, the first being the real part of the value and the second the imaginary part. Two additional type names exist: FLOAT is equivalent to FLOAT32, and DOUBLE is equivalent to FLOAT64. Standards Version 9 deprecates these two aliases, but still allows them.
  All these type names (except those for complex data, which came later) were introduced in Standards Version 5. Earlier Standards Versions specified data types with single-character type aliases:
  - c: UINT8
  - u: UINT16
  - s: INT16
  - U: UINT32
  - S or i: INT32
  - f: FLOAT32
  - d: FLOAT64
  Types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 are not supported before Standards Version 5, so no single-character type aliases exist for these types. These single-character type aliases were deprecated in Standards Version 5 and removed in Standards Version 8. (However, you can still use them, because GetData supports all versions of the Standard).
RECIP: The RECIP field computes the reciprocal of an input field. Syntax is:
<fieldname> RECIP <input> <dividend>
where:
- <input> is the input field code
- <fieldname> is computed as:
  fieldname = dividend / input
  where <dividend> is a scalar.
Introduced in Standards Version 8.
SARRAY:The SARRAY scalar field type is a list of strings fully specified in the format file metadata. Syntax is:
<fieldname> SARRAY <string0> <string1> <string2> ...
where string_n is the n^th element of the array. Each string is a single token. To include whitespace in a string, enclose it in quotation marks (") or else escape the whitespace with the backslash character (\). GetData supports at least 2²⁴ elements. SARRAY appeared in Standards Version 10.
SBIT: The SBIT field type extracts one or more bits out of an input vector field, treating the result as signed. Syntax is:
<fieldname> SBIT <input> <first-bit> [ <num-bits> ]
which specifies <fieldname> to be the <num-bits>-bit long signed integer starting at bit <first-bit> (counting from the least-significant bit, which is numbered zero) of <input> after <input> has been converted to an endianness-corrected two's-complement signed 64-bit integer. If <num-bits> is omitted, it is assumed to be one. Note: all extracted bits are interpreted as two's complement numbers; so, a single-bit signed integer can take the values zero or negative one. Introduced in Standards Version 7.
SINDIR: The SINDIR vector field type performs an indirect translation of a SARRAY scalar field to a derived vector field of strings based on a vector index field. Syntax is:
<fieldname> SINDIR <index> <array>
where index is the vector field, which is converted to an integer type, if necessary, and array is the SARRAY field. The n^th sample of the SINDIR field is the value of the m^th element of array (counting from zero), where m is the value of the n^th sample of index. When n is not a valid element number of array, the corresponding value of the SINDIR is implementation dependent. SINDIR appeared in Standards Version 10.
STRING: The STRING scalar field type is a character string fully specified in the format file metadata. Syntax is:
<fieldname> STRING <string>
where <string> is the string value of the field. Note that <string> is a single token. To include whitespace in the string, enclose the string in quotation marks (""), or else escape the whitespace with the backslash character (\). Introduced in Standards Version 6.

WINDOW: The WINDOW vector field type isolates a portion of an input vector based on a comparison. Syntax is:

<fieldname> WINDOW <input> <check> <op> <threshold>

where

<input> is the vector containing the data to extract,
<check> is the vector on which to test the comparison,
<threshold> is the value against which
<check> is compared, and
<op> is one of the tokens given in Table 3 indicating the particular comparison performed.

Data are extracted when the specified comparison operation is true. Outside the region extracted, GetData returns 0 or NaN. Both <threshold> and <check> are converted to the type given in the table below before comparison. They may not be complex valued.

**Table 3:** WINDOW comparison operations
<op>	Comparison operation	Data Type
EQ	<check> = <threshold>	INT64
NE	<check> ≠ <threshold>	INT64
GE	<check> ≥ <threshold>	FLOAT64
GT	<check> > <threshold>
LE	<check> ≤ <threshold>
LT	<check> < <threshold>
SET	at least one bit set in <check> is also set in <threshold>	UINT64
CLR	at least one bit set in <check> is not set in <threshold>	UINT64

Note: with the EQ operator, this derived field type is very similar to the MPLEX field type. The primary difference is that MPLEX repeats the previous value of the derived field outside the extracted region, while WINDOW just sets it to 0/NaN. WINDOW appeared in Standards Version 9.

Field Parameters

All input vector field parameters should be field codes. Additionally, in Standards Version 6 and later, some of the numerical field parameters may be either literal numbers or else the field code of a CONST or CARRAY scalar field containing the value. In the case of a CARRAY, the field code may be immediately followed by an integer enclosed in angle brackets (< >) specifying which element (counting from zero) of the CARRAY to use (so: field_code<n>). If this is omitted, the first element is assumed. Parameters for which this is possible are:

RAW: spf
BIT, SBIT: bitnum, num-bits
LINCOM: any of the m_i or b_i
MPLEX: count, period
PHASE: shift
POLYNOM: any of the a_i
RECIP: dividend
WINDOW: threshold

Since it is possible to create a field code which is identical to a literal number, a parameter is assumed to be the field code of a scalar field only if it doesn't look like a number.

Starting in Standards Version 9, in additional to decimal notation, literal integer parameters may be specified as hexadecimal numbers, by prefixing the number with 0x or 0X, or as octal numbers, by prefixing the number with 0. Both uppercase and lowercase hexadecimal digits may be used.

In Standards Version 7 and later, a literal complex number is specified as two real (floating point) numbers separated by a semicolon (;) with no intervening whitespace. So, for example, the tokens:

1;0 0;1 4;0 0;5 9.313e2;74.1

represent, respectively, the real unit, the imaginary unit, the real number four, the imaginary number 5i, and the complex number 931.3+74.1i. Because the semicolon character cannot be used in field names, a complex valued literal can never be mistaken for a field code.

Complex literals allow, among other things, the composition of complex valued fields from purely real input fields. For example, a complex valued field, z, may be created from a real valued field re, representing the real part of the complex number, and the real valued field im, representing the imaginary part of the complex number, by specifying:

z LINCOM re 1 0 im 0;1 0

Field Codes

Both when specifying the inputs to a field (as a non-literal scalar parameter, or as an input vector field to a field), and when specifying a field to a GetData call, field codes are used. A field code consists of, in order:

(since Standards Version 10:) optonally, a leading dot (.) indicating this field code is relative to the fragment's root namespace. Without the leading dot, the field code is taken to be relative to the current namespace. (See the discussion in the Namespaces section above for details.)
(since Standards Version 10:) optionally, a non-null subnamespace followed by a dot (.) indicating a subspace under the current or root namespace. The subnamespace may be made up of any number of namespace tags separated by dots, to nest deeper in the namespace tree.
(since Standards Version 6:) if the field in question is a metafield (see the /META directive above), the field name of the metafield's parent (which may be an alias) followed by a forward slash (/).
a simple field name, possibly an alias, indicating a vector or scalar field
(since Standards Version 7:) optionally, a dot (.) followed by a representation suffix.

A representation suffix may be used used to extract a real number from a complex value. The available suffixes (listed here with their preceding dot) and their meanings are:

.a: the argument of the input, that is, the angle (in radians) between the positive real axis and the input. The argument is in the range [−π, π], and a branch cut exists along the negative real axis. At the branch cut, −π is returned if the imaginary part is −0, and π is returned if the imaginary part is +0. If the input is zero, zero is returned.
.i: the imaginary part of the input, that is, the projection of the input onto the imaginary axis.
.m: the modulus (absolute value) of the input
.r: the real part of the input, that is, the projection of the input onto the real axis.
.z: (since Standards Version 10:) the identity representation: it returns the full complex value, equivalent to simply omitting the suffix completely. It is only needed in certain cases to force the correct interpretation of a field code in the presence of a namespace tag. To wit, the field code
name.r
will be interpreted as the real-part (via the .r representation suffix) of the field called name (if such a field exists). To refer to a field called r in the name namespace, the field code must be written:
name.r.z

NB: The first interpretation only occurs with valid representation suffixes; the field code:
name.q
is interpreted as the field q in the name namespace because .q is not a valid representation suffix. Furthermore, ambiguity arises only if both fields "name" and "name.r" are defined. If the field "name" does not exist, but the field "name.r" does, then the original field code is not ambiguous. This is the only representation suffix allowed on SARRAY, SINDIR, and STRING field codes.

If the specified field is purely real, the representations are calculated as if the imaginary part was equal to +0. For example, given a complex valued vector, z, a vector containing the real part of z, called re_z, could be produced with:

re_z PHASE z.r 0

and similarly for the complex field's imaginary part, argument, and absolute value. (Although it should be pointed out this simplistic an example isn't strictly necessary, since z.r could be used wherever re_z would be.)

History

The latest version of the Dirfile Standards is Version 10.

**Table 4:** Dirfile Standards Version history
Version	Release Date	Notes
10	January 2017	Added the INDIR, SARRAY, and SINDIR field types, the /NAMESPACE directive, the optional namespace tag to the /INCLUDE directive and the .z representation suffix.
9	April 2012	Added the MPLEX, and WINDOW field types, the /ALIAS and /HIDDEN directives, the affixes to /INCLUDE, and the optional enc-datum token to /ENCODING. It permitted specification of integer literals in octal and hexadecimal. Finally, it deprecated the type aliases FLOAT and DOUBLE.
8	November 2010	Added the DIVIDE, RECIP and CARRAY field types, made the forward slash on reserved words mandatory, and prohibited using the single-character type aliases in the specification of RAW fields. It also introduced the optional second (arm) token to the /ENDIAN directive.
7	October 2009	Added the POLYNOM and SBIT field types, and complex data types COMPLEX64 and COMPLEX128. It also introduced representation suffixes to field codes, made the n_fields parameter to LINCOM optional, and introduced the directive-free method of specifying metafields.
6	October 2008	Added the /ENCODING, /META, /PROTECT, and /REFERENCE directives and the CONST and STRING field types. It permitted whitespace in tokens and introduced the character escape sequences. It allowed CONST fields to be used as parameters in field specification lines. It also removed FILEFRAM as an alias for INDEX, and allowed # and \ in field codes.
5	August 2008	Added VERSION and ENDIAN, and removed the restriction on field name length. It introduced the data types INT8, INT64, and UINT64, the new-style type specifiers, and increased the range of the BIT field type from 32 to 64 bits. It also prohibited the characters #&/;<>\.\| in field names.
4	October 2006	Added the PHASE field type.
3	January 2006	(The "Large Dirfile Extension") Added INCLUDE, support for sub-dirfiles, and increased the allowed length of a field name from 16 to 50 characters.
2	September 2005	Added the MULTIPLY field type, and added support for LINCOM fields with inputs of differing sample rates.
1	November 2004	Added FRAMEOFFSET and the optional fourth argument to the BIT field type.
0	before March 2003	This Refers to the dirfile standards supported by the GetData library originally introduced into the kst sources, which contained support for all other features covered by this document.

The Dirfile Standards