Introduction
The Dirfile Standards describe the dirfile database format, a filesystem based database for time-ordered binary data. Dirfiles are designed to be a fast, simple format for storing and reading binary time-ordered data. This document provides an unofficial overview of the Dirfile Standards. The official Dirfile Standards are distributed with GetData as three Unix man pages: dirfile(5), dirfile-format(5), and dirfile-encoding(5). Additionally, this document discusses some implementation-dependent behaviour specific to GetData not found in those documents. The latest release of the Dirfile Standards is Standards Version 10 (January 2017).
The dirfile database is centred around one or more time-ordered data streams (a time stream). Each time stream is written to disk in a separate file, in its native binary format. The name of these time stream files correspond to the time stream's field name, a descriptive textual tag.
Two time streams may have different constant sampling frequencies and mechanisms exist within the dirfile format to ensure these time streams remain properly sequenced in time. To do this, the time streams in the dirfile are subdivided into frames. Each frame contains an integer number of samples of each time stream. When synchronous retrieval of data from more than one time stream is required, position in the dirfile can be specified in frames, which will ensure synchronicity.
The time stream files are all located in a central directory, known as the dirfile directory. The dirfile as a whole is typically referred to by its dirfile directory. Included in the dirfile along with the time streams is the dirfile format specification, which is an ASCII text file called format located in the dirfile directory.
Version 3 of the Dirfile Standards introduced the large dirfile extension. This extension added the ability to distribute the dirfile format specification among multiple files (called fragments) in addition to the format file, as well as the ability to house portions of the database in subdirfiles. These subdirfiles may be fully-fledged dirfiles in their own right, but may also be contained within a larger, parent dirfile.
In addition to the raw fields on disk, the dirfile format specification may also specify derived fields which are calculated from one or more raw or derived time streams. Derived fields behave identically to raw fields when read via GetData. See below for a complete list of derived field types.
Dirfiles are designed to be written to and read simultaneously. The dirfile specification dictates that one particular raw field (specified either explicitly with the /REFERENCE directive or implicitly by the format file) is to be used as the reference field: all other vector fields are assumed to have at least as many frames as the reference field has, and the size (in frames) of the reference field is used as the size of the dirfile as a whole.
Version 6 of the Dirfile Standards added the ability to encode the binary files on disk. Each fragment may have its own encoding scheme. Notably this can be used to compress these files. See Dirfile Encodings for information on encoding schemes.
Dirfile Example
An example dirfile is presented as Figure 1 below. The dirfile as a whole is referenced by the name of the directory which forms its base, in this case "dirfile". The base directory contains a format file which contains the metadata for the dirfile database. This format file includes other files (called format file fragments) which provide additional database metadata, including a format file in a subdirectory (a subdirfile). A full description of the format file syntax is given below.
Figure 1: Graphical representation of a dirfile
This dirfile contains four time streams (indicated by RAW in the format file, and often called raw fields), one of which is located in the subdirfile. Five derived fields are also defined (indicated by LINCOM, MULTIPLY, and BIT in the format file—other derived field types also exist, see below).
Each time stream has a corresponding file of binary data containing the time stream itself. Each time stream may have a different sample rate, also indicated in the format file. In this example, for every sample of field1, there are four samples of field2, twelve samples of field3, and eight samples of bits. Derived fields inherit the sample rate of their first input field, so, for example, the derived field diff has the same sample rate as field2.
The binary file associated with the time stream bits is located in the subdirfile because it is defined in the format file fragment in that directory. This is a general rule: the binary file associated with a time stream must reside in the directory which contains the fragment that defines the field. (This is true even if the fragment isn't called format, i.e. binary files associated with raw fields defined in extra_format would have to reside in dirfile.) If this rule is not followed, GetData will be unable to locate the time stream on disk.
The subdirfile is a fully formed dirfile in its own right, since its metadata is fully specified by its format file, which does not refer to any fields defined by its parent. Note, however, that dirfile, is not a complete dirfile without the inclusion of the subdirfile, since the format fragment extra_format refers to a field defined in the subdirfile.
The Format File
The format file is a case-sensitive text file which contains the dirfile database metadata. The explicit text encoding is not specified by the Standards, but it must be 7-bit ASCII compatible. Examples of acceptable character encodings include all the ISO 8859 character sets (i.e. Latin-1 through Latin-10, among others), as well as the UTF-8 encoding of Unicode and UCS.The format file is composed of directive lines and field specification lines, optionally separated by blank lines, or lines containing only whitespace. Lines are separated by the line-feed character (0x0A). Unless escaped (see below), the hash mark (#) is the comment delimiter; the comment delimiter, and any text following it to the end of the line, is ignored.
Tokens
Both directive lines and field specification line consist of several tokens separated by whitespace. Whitespace consists of one or more whitespace characters. These are: space (0x20), horizontal tab (0x09), vertical tab (0x0B), form-feed (0x0C), and carriage return (0x0D). The first token of a directive line is always a reserved word, while a field specification line begins with a field name. As a result, no field may have the same name as a reserved word (although, as of Standards Version 8, all reserved words contain a forward slash character, /, which are prohibited in field names in any case).
Since tokens are separated by whitespace, to include a whitespace character in a token, it must either be escaped by preceding it by a backslash character (\), or replaced by a character escape sequence, see Table 1, below), or else the token must be enclosed in quotation marks ("). The quotation marks themselves are stripped from the token. The null-token (that is, the token consisting of zero characters) may be specified by a pair of quotation marks with nothing between them (""). To include a literal quotation mark or backslash character in a token, it must be escaped (\" or \\). Similarly, a hash mark may be included in a token by including it in a quoted token or else by escaping it (\#), otherwise the hash mark will be understood as the comment delimiter.
It is a syntax error to have a line which contains unmatched quotation marks, or in which the last character is an un-escaped backslash (i.e., line continuation is not allowed).
Several characters when escaped by a preceding backslash character are interpreted as special characters in tokens. Some of these have already been mentioned. The full list of character escape sequences is presented in Table 1. Any other character which is escaped is interpreted as the character itself. (i.e. \c is interpreted as c).
Sequence | Interpretation | Byte |
---|---|---|
\" | a quotation mark character | 0x22 |
\# | a hash mark character | 0x23 |
\a | an alert (bell) character | 0x07 |
\b | a backspace character | 0x08 |
\e | an escape character | 0x1B |
\f | a form-feed character | 0x0C |
\n | a line-feed character | 0x0A |
\r | a carriage return character | 0x0D |
\t | a horizontal tab character | 0x09 |
\v | a vertical tab character | 0x0B |
\\ | a backslash character | 0x5C |
\ooo | the single byte given by the octal number ooo. (1 to 3 octal digits) | 0ooo |
\xhh | the single byte given by the hexadecimal number hh. (1 or 2 hexadecimal digits) | 0xhh |
\uhhhhhhh | the UTF-8 byte sequence encoding the Unicode code point given by the hexadecimal number hhhhhhh. (1 to 7 hexadecimal digits) |
No token may contain the NUL character (0x00). Furthermore, although support is present to create UTF-8 byte sequences, tokens are not required to be valid UTF-8 sequences. Any byte sequence not containing the NULL character forms a valid token. However, there may be further restrictions on allowed characters for a token in a particular situation, (for example, when used as a field name).
Standards Versions 5 and earlier do not recognise the character escape sequences, nor allow quoting of tokens. As a result, they prohibit both whitespace and the comment delimiter from being used in tokens.
Directives
There are eleven directives, each specified by a different reserved word which cannot be used as field names in the dirfile; all directives are optional. As of Standards Version 8, all reserved words start with an initial forward slash (/), to distinguish them from field names. Standards Versions 5, 6 and 7 permit any reserved word to optionally omit its initial forward slash, without change in meaning. Reserved words in Standards Version 4 and earlier may not have an initial forward slash. Like the rest of the format specification, directives are case sensitive.
A number of the directives have fragment scope. A directive with fragment scope only applies to the fragment in which it is present, plus any sub-fragments indicated by the /INCLUDE directive, but only if those sub-fragments don't have their own corresponding directive. Directives which have fragment scope are: /ENCODING, /ENDIAN, /FRAMEOFFSET, and /PROTECT. Because of these scoping rules, different portions of the dirfile may have different encodings, endiannesses, frame offsets, or protection levels.
If a directive with fragment scope appears more than once in a fragment, only the last such directive is be honoured, with the exception that the effect of a directive is not propagated to sub-fragments if the directive line appears after the sub-fragment is included. The scoping rules of the remaining directives are discussed below.
- /ALIAS: The /ALIAS directive defines an alternate name for a
field defined elsewhere in the format specification (called the
target). Aliases may not be used as the parent field in a
/META directive, but are in most other ways indistinguishable
from the target's original, canonical name. Aliases may be chained
(that is, the target name appearing in an /ALIAS directive may itself be
an alias). In this case, the new alias is another name for the target's
own target. Just as there is no requirement that the input fields of a
derived field exist, it is not an error for the target of an alias to
not exist. Syntax is:
/ALIAS <name> <target>A metafield alias may defined using the <parent-field>/<alias-name> syntax for name in the /ALIAS directive. No restriction is placed on target; specifically, a metafield alias may target a top-level field, or a metafield of with a different parent; conversely, a top-level alias may target a metafield.
A metafield alias may never appear as the parent part of a metafield field code, even if it refers to a top-level field. That is, given the valid format:
aaaa RAW UINT8 1the metafield aaaa/bbbb may not be referred to as cccc/dddd/bbbb, even though cccc/dddd is a valid field code referring to aaaa.
aaaa/bbbb CONST FLOAT64 0.0
cccc RAW UINT8 1
/ALIAS cccc/dddd aaaaThis is not true of top-level aliases: if eeee is an alias of ffff, then ffff/gggg, a metafield of ffff, may be referred to as eeee/gggg.
The /ALIAS directive has no scope: it is processed immediately. It appeared in Standards Version 9.
- /ENCODING: The /ENCODING directive specifies the encoding
scheme used to encode binary files in the dirfile. Syntax is:
/ENCODING <scheme> [enc-datum]
The encoding scheme may be one of the pre-defined names listed below, which are described in more detail on the Dirfile Encodings page, or any other site-specific encoding scheme. The pre-defined schemes are:
- none: The dirfile is unencoded.
- bzip2: The dirfile is bzip2 encoded using the bzip2 compression library.
- flac: The dirfile is flac encoded using the libFLAC compression library.
- gzip: The dirfile is gzip encoded using the zlib compression library.
- lzma: The dirfile is lzma encoded using the liblzma compression library.
- sie: The dirfile is sample-index encoded (a variant of run-length encoding).
- slim: The dirfile is slim encoded using the slimlib compression library.
- text: The dirfile is text encoded.
- zzip: The dirfile is zzip encoded using the zzip compression library.
- zzslim: The dirfile is zzslim encoded using a combination of the slim and zzip compression libraries.
GetData will accept dirfiles it encounters with an encoding scheme not listed here but, in this case, no binary file I/O (affecting gd_getdata(), gd_putdata(), gd_nframes(), &c.) is possible on the dirfile. If no encoding scheme is specified, GetData will try to determine the encoding scheme automatically, or in the case of a newly created dirfile, assume the dirfile is unencoded.
The /ENCODING directive has fragment scope. Introduced in Standards Version 6. The predefined schemes sie, zzip, and zzslim and the optional enc-datum token, appeared in Standards Version 9; the predefined scheme lzma appeared in Standards Version 7; all other predefined schemes appeared in Standards Version 6.
- /ENDIAN: The /ENDIAN directive specifies the endianness of
the raw data in the database. Syntax is:
/ENDIAN ( big | little ) [ arm ]where big, and little specify the byte-ordering the data, and arm is specified when double-precision floating point data are in the middle-endian format used by older ARM processors.
GetData will assume that any dirfile that omits this directive is in the native endianness of the architecture on which it is running (i.e. no endianness swapping will take place.) Otherwise, if the endianness of the dirfile is different than the endianness of the architecture, GetData will perform endianness conversion while reading from and writing to raw data on disk. The /ENDIAN directive has fragment scope. Introduced in Standards Version 5; Standards Version 8 added the optional arm token.
- /FRAMEOFFSET: The /FRAMEOFFSET directive specifies the frame
number of the first frame in the database (the beginning-of-field
marker). Syntax is:
/FRAMEOFFSET <integer>Requests for data before the beginning-of-field marker will result in zero (for integer return types) or IEEE not-a-number (for floating point types). Attempts to write to the data before the beginning-of-field marker will result in an error, and no data written. The /FRAMEOFFSET directive has fragment scope. Introduced in Standards Version 1.
- /HIDDEN: The /HIDDEN directive indicates that the specified
field name is hidden. The Standards themselves don't specify
what this means, but by default GetData suppresses hidden field names
in the output of the field counting and
field listing functions. Hiddenness is
not inherited by metafields of the specified field. Hiddenness applies
to the name, not the field itself; it does not hide all aliases of the
field name, and if the field name an alias, the alias is hidden, not its
target. Syntax is:
/HIDDEN <fieldname>A /HIDDEN directive must appear after the specification of fieldname, (which occurs either in a field specification line, or an /ALIAS directive, or a /META directive) in the same fragment.
The /HIDDEN directive has no scope: it is processed immediately. It appeared in Standards Version 9.
- /INCLUDE: The /INCLUDE directive specifies another file to
parse for additional metadata for the dirfile. The inclusion is
processed immediately, before the fragment containing the /INCLUDE
directive (the parent fragment) is parsed further.
RAW fields specified in the included fragment are located in
the directory containing the fragment file, and not in the directory
containing the parent fragment, and the binary file encoding may be
different for each fragment. The fragment may be specified either with
an absolute path, or else a path relative to the directory containing
the parent fragment.
The /INCLUDE directive may optionally specify a prefix and/or suffix to apply to field names defined in the included fragment. If present, affixes are applied to all fieldnames (including aliases) defined in the included fragment and any fragments it further includes. Affixes nest, with the affixes of the deepest inclusion innermost. Affixes are not applied to the names of binary files associated with RAW fields. Syntax is:
/INCLUDE <file> [<namespace>.][<prefix>] [<suffix>]To specify only suffix, the null token ("") may be used as prefix. A namespace may also be specified in an /INCLUDE directive by prepending it to prefix. The namespace and prefix are separated by a dot (.). The dot is required whenever a namespace is specified: if the prefix is empty, the third token should be just the namespace followed by a trailing dot. If a namespace is specified, that namespace, relative to the including fragment's root namespace, becomes the root namespace of the included fragment. If no namespace is specified in the /INCLUDE directive, then the current namespace (specified by a previous /NAMESPACE directive) is used as the root namespace of the included fragment. That is, if the current namespace is current_space, then the statement:
/INCLUDE file newspace.is equivalent to/NAMESPACE newspaceAs a result, if no namespace is provided, and there has been no previous /NAMESPACE directive, the included fragment will have the same root namespace as the including fragment.
/INCLUDE file
/NAMESPACE current_spaceThe /INCLUDE directive has no scope: it is processed immediately. It appeared in Standards Version 3. The optional prefix and suffix appeared in Standards Version 9. The optional namespace appeared in Standards Version 10.
- /META: The /META directive specifies a
metafield attached to to a particular parent field. The field metadata
may be of any allowed type except RAW. Metafields are retrieved
in exactly the same way as regular field data, but the field code
specified consists of the parent and metafield names joined with a
forward slash:
<parent-field>/<meta-field>/META field directives may not be specified before the parent field has been. Syntax is:/META <parent-field> {field specification line}The parent field code may not be an alias. As an illustration of this concept,/META parent meta CONST FLOAT64 3.291882provides a scalar metadatum called meta with value 3.291882 attached to the field parent. This particular metafield may be referred to by the field code "parent/meta". Note that different parent fields may have metafields with the same name, since all references to metafields must include the parent field name. Metafields may not themselves have further sub-metafields. The /META directive has no scope: it is processed immediately and has no long-term effect. /META directives are required to appear after their parent's specification, and in the same fragment.
For simplicity, starting with Standards Version 7, the above metafield can also be specified as:
parent/meta CONST FLOAT64 3.291882making it look like a regular field specification line. Introduced in Standards Version 6. - /NAMESPACE: The /NAMESPACE directive
changes the "current namespace for subsequent field specification
lines. Syntax is:
/NAMESPACE <subspace>The subspace specified is relative to the current fragment's root namespace. If subspace is the null-token ("") the current namespace will be set back to the root namespace. Otherwise, the current namespace will be changed to the concatenation of the root namespace with subspace, with the two parts separated by a dot:rootspace.subspaceIf rootspace is empty, the intervening dot is omitted, and the current namespace is simplysubspaceBy default, all field codes, both field names for newly specified fields, and field codes used as inputs to fields or targets for aliases, are placed in the current namespace, unless they start with an initial dot, in which case the current namespace is ignored, and they're placed instead in the fragment's root namespace. See the Namespaces section for further details.
The /NAMESPACE directive has no scope: it is processed immediately. For the effects of changing the current namespace on included fragments, see the /INCLUDE directive above. The effects of a /NAMESPACE directive never propagate upwards to parent fragments. It appeared in Standards Version 10.
- /PROTECT: The /PROTECT directive
specifies the advisory protection level of the current fragment and of
the RAW fields defined therein. The protection level indicates
whether writing to the format file fragment, or the binary data on disk
is permitted. Syntax is:
/PROTECT <level>Four advisory protection levels are defined:
- none: No protection at all: data and metadata may be freely changed. This is the default, if no /PROTECT directive is present.
- format: The dirfile metadata is protected from change, but RAW data on disk may be modified.
- data: The RAW data on disk is protected from change, but metadata may be modified.
- all: Both metadata and data on disk are protected from change.
- /REFERENCE: The /REFERENCE directive specifies the name of
the field to use as the dirfile's reference
field. If no /REFERENCE directive is specified, the first
RAW field encountered is used as the reference field. The
/REFERENCE directive must specify a RAW field. Syntax is:
/REFERENCE <field-code>The /REFERENCE directive has global scope: if multiple /REFERENCE directives appear in the dirfile metadata, only the last such is honoured. Introduced in Standards Version 6.
- /VERSION: The /VERSION directive specifies the particular
version of the Dirfile Standards to which the dirfile conforms. Syntax
is:
/VERSION <integer>When a /VERSION directive indicates a Standards Version greater than GetData is prepared to deal with, it triggers permissive mode. In permissive mode, unrecognised lines (which it assumes are valid syntax in a newer version of the Standards) are silently ignored.
The /VERSION directive has immediate scope: its effect is immediate, and it applies only to metadata below it, including and propagating downwards to sub-fragments after the directive.
In Standards Version 8 and earlier, its effect also propagates upwards back to the parent fragment, and affects subsequent metadata. Starting with Standards Version 9, this no longer happens. As a result, a /VERSION directive which indicates a version of 9 or later never propagates upwards; additionally, /VERSION directives found in subfragments included in a Version 9 or later fragment aren't propagated upwards into that fragment, regardless of the Version of the subfragments. The /VERSION directive appeared in Standards Version 5.
Field Specifications
Any line which does not start with a reserved word is assumed to be a field specification line. A field specification line consists of at least two tokens. The first token is the field name. The second token is the field type. Subsequent tokens are field parameters. The meaning and number these parameters depends on the field type specified.
Field Names
A field name consists of one or more characters, excluding both ASCII control characters (bytes 0x00 through 0x1F) and the reserved characters listed in Table 2 according to Standards Version. Furthermore, the field name of a RAW field may only contain characters allowed in filenames†. Although never allowed in a field name, a forward slash (/) can be used to define metafields; see above under the /META directive. Like the rest of the format file, field names are case sensitive.
Version | Reserved Characters |
---|---|
0–4 | #‡ / whitespace‡ |
5 | #‡ / & ; < > | whitespace‡ |
6— | / & ; < > | . |
The field name may not be INDEX, which is a special, implicit field which contains the integer frame index. Standards Version 5 and earlier also prohibit FILEFRAM as a field name; it was an alias for INDEX, (which arose in the prehistoric times of ReadData, GetData's spiritual predecessor).
Standards Version 3 and 4 restrict field names to 50 characters. Standards Version 2 and earlier restrict field names to 16 characters. Additionally, the filesystem will put restrictions on the length of a RAW field name, regardless of Standards Version*.
Starting in Standards Version 7, if the field name beginning a field
specification line contains exactly one forward slash character
(/), the line is assumed to specify a metafield. See the
/META directive above for further details. A field
name may not contain more than one forward slash.
†: Consult the documentation of the
filesystem backing the database for details, although most modern
filesystems permit any byte except NUL (0x00) or, failing that, any
Unicode character except NUL.
*: Again, consult your filesystem
documentation, but most modern filesystems permit filenames of at
least 255 bytes.
Each fragment has an immutable root namespace The root namespace
of the primary format file is the null namespace, identified by the
null-token (""). The root namespace of other fragments is specified
when they are introduced (see the /INCLUDE
directive). Each fragment also has a current namespace which
may be changed as often as needed using the
/NAMESPACE directive, and defaults to the root
namespace. The current namespace is always either the root namespace or
else a subspace under the root namespace.
If a field name or field code starts with a leading dot, then that name
or code is taken to be relative to the fragment's root space. If it does
not start with a dot, it is taken to be relative to the current namespace.
For example, if the both the root namespace and current namespace of a
fragment start off as rootspace, then:
Note that a field code may specify deeper subspaces under either the
root namespace or the current namespace (meaning it is never necessary to
use the /NAMESPACE directive). Note also that there is no way for metadata
in a given fragment to refer to fields outside the fragment's root space.
There is one exception to this namespace scoping rule: the implicit
INDEX vector is always in the null (top-level) namespace, and
namespace tags specified with it, either explicitly or implicitly, even a
fragment root namespace, are ignored. So, in a fragment with root
namespace rootspace, and current namespace
rootspace.subspace,
There are eighteen field types. Of these, fourteen are of vector type
(BIT, DIVIDE, INDIR, LINCOM, LINTERP,
MPLEX, MULTIPLY, PHASE, POLYNOM, RAW,
RECIP, and SBIT, SINDIR, WINDOW) and four are
of scalar type (CARRAY, CONST, SARRAY, and
STRING). The eleven vector field types other than RAW
fields are also called derived fields, since they derive their
value from one or more input fields.
Five of these derived fields (DIVIDE, LINCOM,
MPLEX, MULTIPLY, and WINDOW) may have more than one
vector input field. In situations where these input fields have differing
sample rates, the sample rate of the derived field is the same as the
sample rate of the first (left-most) input field specified. Furthermore,
the input fields are synchronised by aligning them on frame boundaries,
assuming equally-spaced sampling throughout a frame, and using the last
sample of each input field which did not occur after the sample of the
derived field being computed. That is, if the first and second input
fields have sample rates s1 and s2,
the derived field also has sample rate s1 and, for every
sample of the derived field, n, the n'th sample of the first
field is used (since they have the same sample rate by definition), and
the sample number used of the second field, m, is computed as:
Starting in Standards Version 6, certain scalar field parameters in the
field specifications may be specified using CONST or CARRAY
fields, instead of literal values. A list of parameters for which this is
allowed is given below in the
Field Parameters section.
The possible fields types are:
<fieldname> is computed as:
At every sample n, the derived field is computed as:
The <index> vector is converted to
an integer type for comparison. The value of the derived field before
the first sample where <index> equals
<count> is poorly defined. Typically
GetData searches backwards for its value, but see the
gd_mplex_lookback() function.
The values of <count> and <period> place no restrictions on values
contained in <index>. Specifically,
particular values of <index>
(including <count>) need not be
equally spaced (neither by <period>
nor any other spacing); <index> need
not ever take on the value <count> (in
which case GetData will return 0 or NaN for the entirety of the output
vector). MPLEX appeared in Standards Version 9.
All these type names (except those for complex data, which came
later) were introduced in Standards Version 5. Earlier Standards
Versions specified data types with single-character type aliases:
Types INT8, UINT64, INT64, COMPLEX64,
and COMPLEX128 are not supported before Standards Version 5,
so no single-character type aliases exist for these types. These
single-character type aliases were deprecated in Standards Version 5
and removed in Standards Version 8. (However, you can still use them,
because GetData supports all versions of the Standard).
Note: with the EQ operator, this derived field type is
very similar to the MPLEX field type. The primary difference is that
MPLEX repeats the previous value of the derived field outside the
extracted region, while WINDOW just sets it to 0/NaN. WINDOW appeared
in Standards Version 9.
All input vector field parameters should be field codes. Additionally, in Standards
Version 6 and later, some of the numerical field parameters may be
either literal numbers or else the field code of a CONST or
CARRAY scalar field containing the value. In the case of a
CARRAY, the field code may be immediately followed by an integer
enclosed in angle brackets (< >) specifying which element
(counting from zero) of the CARRAY to use (so:
field_code<n>). If this is omitted, the
first element is assumed. Parameters for which this is possible are:
Starting in Standards Version 9, in additional to decimal notation,
literal integer parameters may be specified as hexadecimal numbers, by
prefixing the number with 0x or 0X, or as octal numbers, by
prefixing the number with 0. Both uppercase and lowercase
hexadecimal digits may be used.
In Standards Version 7 and later, a literal complex number is specified
as two real (floating point) numbers separated by a semicolon (;)
with no intervening whitespace. So, for example, the tokens:
Complex literals allow, among other things, the composition of
complex valued fields from purely real input fields. For example, a
complex valued field, z, may be created from a real valued field
re, representing the real part of the complex number, and the real
valued field im, representing the imaginary part of the complex
number, by specifying:
Both when specifying the inputs to a field (as a non-literal scalar
parameter, or as an input vector field to a field), and when specifying
a field to a GetData call, field codes are used. A field
code consists of, in order:
A representation suffix may be used used to extract a real
number from a complex value. The available suffixes (listed here with
their preceding dot) and their meanings are:
NB: The first interpretation only occurs with valid representation
suffixes; the field code:
If the specified field is purely real, the representations are
calculated as if the imaginary part was equal to +0. For example, given a
complex valued vector, z, a vector containing the real part of
z, called re_z, could be produced with:
The latest version of the Dirfile Standards is Version 10.
Namespaces
Beginning with Standards Version 10, every field in a Dirfile is contained
in a namespace. Every namespace is identified by a namespace tag
which consist of the same restricted set of characters used for field
names (see Table 2, above). Namespaces nest
arbitrarily deep. Subnamespaces are identified by concatenating all
namespace tags, separating tags by dots (.), with the outermost
namespace leftmost:
.bbbb RAW UINT8 1
cccc.dddd RAW UINT8 1
.eeee.ffff RAW UINT8 1
/NAMESPACE newspace
gggg RAW UINT8 1
.hhhh RAW UINT8 1
iiii.jjjj RAW UINT8 1
.kkkk.llll RAW UINT8 1
all refer to the same INDEX field.
Field Types
Introduced in Standards Version 4.
Introduced in Standards Version 7.
Signed integer data are two's complement. Complex data in the Fortran
storage format (which is also specified by
C99 § 6.2.5.13): two consecutive floating-point
numbers, the first being the real part of the value and the second the
imaginary part. Two additional type names exist: FLOAT is
equivalent to FLOAT32, and DOUBLE is equivalent to
FLOAT64. Standards Version 9 deprecates these two aliases,
but still allows them.
Introduced in Standards Version 8.
Data are extracted when the specified comparison operation is true.
Outside the region extracted, GetData returns 0 or NaN. Both <threshold> and <check> are converted to the type
given in the table below before comparison. They may not be complex
valued.
<op>
Comparison operation
Data Type
EQ
<check> = <threshold>
INT64
NE
<check> ≠ <threshold>
GE
<check> ≥ <threshold>
FLOAT64
GT
<check> > <threshold>
LE
<check> ≤ <threshold>
LT
<check> < <threshold>
SET
at least one bit set in <check> is also set in <threshold>
UINT64
CLR
at least one bit set in <check> is not set in <threshold>
Field Parameters
Since it is possible to create a field code which is identical to a
literal number, a parameter is assumed to be the field code of a
scalar field only if it doesn't look like a number.
Field Codes
History
Version Release Date Notes
10 January 2017 Added the INDIR,
SARRAY, and SINDIR field types, the
/NAMESPACE directive, the optional namespace tag
to the /INCLUDE directive and the .z representation
suffix.
9 April 2012 Added the MPLEX, and
WINDOW field types, the /ALIAS and /HIDDEN
directives, the affixes to /INCLUDE, and the optional
enc-datum token to /ENCODING. It permitted
specification of integer literals in octal and hexadecimal.
Finally, it deprecated the type aliases FLOAT and
DOUBLE.
8 November 2010 Added the DIVIDE,
RECIP and CARRAY field types, made the forward slash
on reserved words mandatory, and prohibited using the
single-character type aliases in the specification of RAW
fields. It also introduced the optional second (arm) token
to the /ENDIAN directive.
7 October 2009 Added the POLYNOM and
SBIT field types, and complex data types COMPLEX64
and COMPLEX128. It also introduced representation suffixes
to field codes, made the n_fields
parameter to LINCOM optional, and introduced the
directive-free method of specifying metafields.
6 October 2008 Added the /ENCODING,
/META, /PROTECT, and /REFERENCE directives and
the CONST and STRING field types. It permitted
whitespace in tokens and introduced the character escape sequences.
It allowed CONST fields to be used as parameters in field
specification lines. It also removed FILEFRAM as an alias
for INDEX, and allowed #
and \ in field codes.
5 August 2008
Added VERSION and
ENDIAN, and removed the restriction on field name length. It
introduced the data types INT8, INT64, and
UINT64, the new-style type specifiers, and increased the
range of the BIT field type from 32 to 64 bits. It also
prohibited the characters
#&/;<>\.| in field
names.
4 October 2006
Added the PHASE field type.
3 January 2006
(The "Large Dirfile Extension") Added INCLUDE, support
for sub-dirfiles, and increased the allowed length of a field name
from 16 to 50 characters.
2 September 2005
Added the MULTIPLY field type, and added support for
LINCOM fields with inputs of differing sample rates.
1 November 2004
Added FRAMEOFFSET and the optional fourth argument to the
BIT field type.
0 before
March 2003
This Refers to the dirfile standards supported by the GetData
library originally introduced into the
kst sources, which contained
support for all other features covered by this document.