Introduction
GetData is a C library. The APIs it provides are described below. In addition to this C library, bindings are also provided to simplify using GetData in other programming languages. The library is designed to be robust, but does, at times, sacrifice exactness for speed.
This page provides only a partial overview of the functionality of the modern API. Full API documentation is provided in a collection of UNIX manual pages distributed with GetData, and linked from the function descriptions via the [man] links in the section headings.
Building GetData
GetData uses the standard GNU autoconf, automake, and libtool suites to configure, build, and install the package. For people unfamiliar with these tools, the package contains generic instructions for using these tools in the file INSTALL included in the distribution. Briefly, to build the package, it should first be configured by running:
The GetData Header
The GetData APIs
Due to a desire to provide a flexible and full API on multiple architectures while still providing backwards compatibility, a number of at least partially distinct APIs have appeared. The differences between these APIs is explained further below. GetData-0.10 provides the following APIs:
- The C99 API
The "default" modern API, the C99 API is the one primarily described here. Not surprisingly, it requires a C99-compliant compiler to build. If one is not available, it cannot be built and the C89 API becomes the default API.
- The C89 API
An ANSI-C compliant API for use in situations where the C99 API is inappropriate. If GetData is built without a C99-compliant compiler, this will be the default API, and the C99 API will be absent. But, even when GetData is built with a C99-compliant compiler, this API is still available. The most obvious distinction compared to the C99 API is the lack of native complex data types, but other, subtler changes (described below) are also present, due to differences between ANSI-C and C99. For the most part, interchangeable with the C99 API.
- The Explicit 64-bit API
A number of functions in the C99 and C89 APIs use the POSIX standard type off_t when dealing with file offsets and sizes. On some systems, this is a 32-bit type, which has the effect of limiting file sizes to 231 bytes. In this situation, the Explicit 64-bit API provides additional function entry points with explicit support for larger files using a special 64-bit gd_off64_t type. In the interests of portability, this API is available even on systems with a natively 64-bit off_t.
- The Legacy API
The oldest available API, the Legacy API is the GetData library API provided before GetData-0.3. This API suffers from a number of issues including a lack of thread safety, and no support for large (>231 bytes) files, nor modern Dirfiles (Standards Version 4 is the newest that it fully supports). It has been marked deprecated since the release of GetData-0.3 (August 2008). No new software should be written for it. As of GetData-0.9, it is not included in default builds of the library, but may be enabled at compile time.
This document concerns itself primarily with the modern APIs. Users of pre-0.3 versions of GetData may also be interested in the Legacy API document which explains in more detail the differences between the modern APIs and the legacy API.
C99 versus C89 APIs
The default API defined by getdata.h makes use of the C99 _Complex keyword to define complex valued arguments and structure members. A C89-conformant API is also available, for use with compilers that do not understand C99 complex data types. To use the C89 API, define GD_C89_API before including getdata.h. This has the following effects:
- since anonymous structs and unions are prohibited, many of the member names of the gd_entry_t structure will be changed. For example, the ca member becomes u.polynom.ca.
- complex valued members in the gd_entry_t structure will be
replaced by two-element, purely real arrays, with the first element
being the real part and the second element the imaginary part of the
complex number. That is,
double complex ca[6]will become:double ca[6][2].
- pointers to complex valued array data passed to API functions are
replaced by pointers to purely real arrays, of twice the length. So
the following, which uses the C99 API:
double complex coeffs[3] = {can be equivalently performed in the C89 API using:
1.1 + _Complex_I * 2.2,
3.3 + _Complex_I * 4.4,
5.5 + _Complex_I * 6.6
};
gd_add_cpolynom(dirfile, "field", 2, "input", coeffs, 0);double coeffs[6] = {
1.1, 2.2,
3.3, 4.4,
5.5, 6.6
};
gd_add_cpolynom(dirfile, "field", 2, "input", coeffs, 0); - similarly, complex data passed by value are replaced by two-element
purely real arrays, i.e. the following C99 code:
double complex dividend = 1.1 + _Complex_I * 2.2;is equivalent to the C89 code:
gd_add_crecip(dirfile, "field", "input", dividend, 0);double dividend[2] = {1.1, 2.2};
gd_add_crecip(dirfile, "field", "input", dividend, 0);
The two APIs use the same library code internally, and differ only in the function entry points and the function prototypes defined by getdata.h.
If GetData is built without the benefit of a C99 compiler, either because the ./configure couldn't find a suitable modern compiler, or else because --enable-ansi-c was passed to ./configure, the C99 API will be completely lacking from the library, and get getdata.h will declare only the C89 API. In this case, getdata.h will define GD_NO_C99_API to indicate that the C99 API is missing from the library.
Explicit 64-bit Support
To overcome the file size limit imposed by a 32-bit off_t, GetData provides an optional API for explicit 64-bit support. Defining GD_64BIT_API before including getdata.h will define the 64-bit type gd_off64_t, as well as declare additional functions for this 64-bit type. If the platform provides off64_t, the GetData type will be simply that.
On platforms where off_t is 64-bits wide, this API may still be useful for portable programming; in this case gd_off64_t is simply off_t. On some platforms this API may be automatically enabled; in this case, the symbol GD_64BIT_API is ignored.
The explicit 64-bit functions this API declares are:
- gd_alter_frameoffset64
- gd_bof64
- gd_eof64
- gd_framenum_subset64
- gd_frameoffset64
- gd_getdata64
- gd_nframes64
- gd_putdata64
- gd_seek64
- gd_tell64
The Legacy API
The GetData header file, getdata.h, installed in the ${prefix}/include directory, declares the C API. It also includes getdata_legacy.h (also installed) which declares the legacy API. The legacy header should never be included directly. Defining the preprocessor symbol GD_NO_LEGACY_API before including getdata.h will prevent the legacy API from being declared. In cases when the legacy API is declared, getdata.h will define the symbol GD_LEGACY_API, which can be used by callers to determine whether the legacy API is present at compile time.
If the legacy API is not built (which is the default behaviour), getdata_legacy.h will not be installed, and the legacy API will never be declared, regardless of the state of GD_NO_LEGACY_API.
Language Bindings
GetData is written in C and provides a C API to users. For convenience, bindings are available which translate this API into other languages. As of the version 0.9 release, light-weight bindings are available for the following languages:
The GetData Bindings document provides an overview of these bindings. Full documentation is included in the package. The remainder of this document deals with the C API exclusively.Working with Dirfiles
In the C API, dirfiles are represented by the DIRFILE object. This object is an opaque structure, which callers need only ever reference by pointer. After every call, the library updates the dirfile error status, which indicates whether the call was successful or not. This error status may be retrieved by calling gd_error(). Good programming practice is to check the error status after every call to the GetData library on the DIRFILE.
gd_cbopen()man page
Opening a dirfile creates a DIRFILE object. A dirfile is opened with a call to
Flag | Description |
---|---|
GD_RDONLY | flags should contain exactly one of these, which specifies the dirfile access mode: read-only or read/write. If neither is specified, GD_RDONLY is assumed. |
GD_RDWR | |
GD_CREAT | Create the dirfile if it doesn't already exist. |
GD_EXCL | When specified with GD_CREAT, exclusively create the dirfile (i.e. fail if the dirfile already exists). Ignored if GD_CREAT is not also specified. |
GD_TRUNC | If the dirfile already exists, truncate it before opening it. Truncating a dirfile deletes all files in the specified dirfile directory, so use this flag with caution. |
GD_TRUNCSUB | If truncating a dirfile, also delete subdirectories. Ignored if GD_TRUNC is not also specified. |
GD_ARM_ENDIAN | Specify the endianness (byte sex) of the raw data on disk, if it isn't already specified in the dirfile itself. |
GD_BIG_ENDIAN | |
GD_LITTLE_ENDIAN | |
GD_NOT_ARM_ENDIAN | |
GD_FORCE_ENCODING | Ignore any encoding specified in the dirfile itself: just use the encoding specified by these flags. |
GD_FORCE_ENDIAN | Ignore any endianness specified in the dirfile itself: just use the byte sex specified by these flags. |
GD_PEDANTIC | Reject dirfiles which don't conform to the Dirfile Standards. |
GD_PERMISSIVE | Accept non-compliant syntax, even if the dirfile contains a /VERSION directive. |
GD_VERBOSE | Automatically write error messages to standard error when errors are triggered on this dirfile. |
GD_IGNORE_DUPS | Ignore duplicate field names while parsing the dirfile metadata. |
GD_IGNORE_REFS | Ignore /REFERENCE directives while parsing the dirfile metadata—this flag is only honoured by gd_include(). |
GD_PRETTY_PRINT | Attempt to make a nicer looking format specification (in the human-readable sense) when writing metadata to disk. |
One of the following symbols indicating the default encoding
scheme of the dirfile may also be included in
flags. Like the endianness flags,
the choice of encoding here is ignored if the encoding is specified in the
dirfile itself, unless GD_FORCE_ENCODING is also specified. If none of these
symbols is present, GD_AUTO_ENCODED is
assumed, unless the gd_open call results
in creation or truncation of the dirfile. In that case
GD_UNENCODED is assumed. See
Dirfile Encodings for details on dirfile
encoding schemes.
Flag Description
GD_AUTO_ENCODED
The encoding type is not known in advance, but should be
detected by the GetData library. Detection is accomplished by
searching for raw data files with extensions appropriate to the
encoding scheme. This method will notably fail if the the
library is called via gd_putdata() to create a
previously non-existent raw field unless a read is first
successfully performed on the dirfile. Once the library has
determined the encoding scheme for the first time, it remembers
it for subsequent calls.
GD_BZIP2_ENCODED
Specifies that raw data files are bzip2 encoded: compressed
using the bzip2 library.
GD_FLAC_ENCODED
Specifies that raw data files are FLAC encoded: compressed
using the Free Lossless Audio Codec.
GD_GZIP_ENCODED
Specifies that raw data files are gzip encoded: compressed
using the zlib library, and stored in the gzip container
format.
GD_LZMA_ENCODED
Specifies that raw data files are lzma encoded: compressed
using the liblzma library and stored in the .xz container
format.
GD_SLIM_ENCODED
Specifies that raw data files are slimlib encoded: compressed
using the slimlib library.
GD_TEXT_ENCODED
Specifies that raw data files are text encoded: as text files
containing one data sample per line.
GD_UNENCODED
Specifies that raw data files are not encoded, but written as
raw binary data.
If an error occurs while attempting to open the dirfile, a newly created (but invalid) DIRFILE will still be returned, to allow the caller to examine the error status via gd_error(). Once that is done, the invalid DIRFILE should be de-allocated by calling gd_close() or gd_discard().
gd_open()man page
If no parser callback handler is required, the dirfile may be opened with a call to
gd_parser_callback()man page
In addition to being used while the dirfile is first parsed with gd_cbopen(), the registered parser callback function may also be invoked by gd_include(). This parser callback may be changed or removed with a call to
gd_flags()man page
After opening a dirfile, flags which affect long-term operation may be modified by calling
gd_close()man page
Any DIRFILE, valid or not, created by calling gd_open() should be de-allocated with a call to either gd_discard(), or
gd_discard() man page
Any DIRFILE, valid or not, created by calling gd_open() should be de-allocated with a call to either gd_close(), or
gd_flush() man page
To flush and close file descriptors associated with a field, call
gd_sync() man page
To flush file descriptors associated with a field without closing them, call
gd_raw_close() man page
To close file descriptors associated with a field without performing an explicit flush, call
gd_metaflush() man page
If you wish to simply flush the dirfile metadata to disk, without flushing the field data, call
gd_rewrite_fragment() man page
To force GetData to rewrite a format specification fragment, even if it hasn't changed, call
gd_error() man page
The error status from the last call on a given DIRFILE object can be obtained by calling
Error code | Description |
---|---|
GD_E_OK | The call returned successfully. This is guaranteed to evaluate to zero. |
GD_E_ALLOC | An error occurred while attempting to allocate memory. |
GD_E_ACCMODE | An attempt was made to write to a dirfile opened in read-only mode. |
GD_E_ARGUMENT | An invalid parameter value was passed to a library function. |
GD_E_BAD_CODE | An invalid field code was passed to the library. |
GD_E_BAD_DIRFILE | A dirfile flagged as invalid (typically created from a failed gd_open() call) was passed to the library. |
GD_E_BAD_ENTRY | An invalid field parameter was passed to the library. |
GD_E_BAD_FIELD_TYPE | An invalid entry type was passed to the library |
GD_E_BAD_INDEX | An invalid fragment index number was passed to the library. |
GD_E_BAD_REFERENCE | The reference field specified by the dirfile metadata could not be found, or was not a RAW field. |
GD_E_BAD_SCALAR | A field code used as a field parameter did not specify a valid scalar field. |
GD_E_BAD_TYPE | An invalid data type was passed to the library. |
GD_E_BOUNDS | An attempt was made to read a CARRAY element past the end of the field. |
GD_E_CALLBACK | The return value of the parser callback function was not recognised. |
GD_E_CREAT | An error occurred while attempting to create a new dirfile. |
GD_E_DELETE | An error occurred while attempting to delete a field from the dirfile. |
GD_E_DIMENSION | An attempt was made to use a scalar field where a vector field was required. |
GD_E_DOMAIN | A bad frame range was passed to the library. |
GD_E_DUPLICATE | The name of a new field duplicated that of an existing field. |
GD_E_EXISTS | A request to exclusively create a Dirfile failed. |
GD_E_FORMAT | A syntax error was found in the format specification. |
GD_E_INTERNAL_ERROR | An internal library error occurred. This indicates a bug in the library itself and the nature of the error should be reported to the GetData mailing list (getdata-devel@lists.sourceforge.net). |
GD_E_IO | An I/O error occurred while accessing data on disk. |
GD_E_LINE_TOO_LONG | GetData tried to read or write a line in a format fragment or linterp table longer than it was able to deal with. It should be able to handle lines up to at least 231 bytes long, which means this error usually indicates a pathological problem. |
GD_E_LUT | There was a syntax error in a LINTERP table file. |
GD_E_PROTECTED | An operation was prohibited by the current protection level. |
GD_E_RANGE | An offset passed to the library was outside its allowed range. |
GD_E_RECURSE_LEVEL | The library reached its recursion limit while attempting to resolve an input to a non-RAW vector field. This typically indicates a circular dependency. |
GD_E_UNCLEAN_DB | An operation on the dirfile failed to complete, and left the database in an unclean state. |
GD_E_UNKNOWN_ENCODING | An I/O operation could not be accomplished because the encoding scheme specified by the dirfile is not understood by GetData. |
GD_E_UNSUPPORTED | An I/O operation is not supported by the current encoding scheme. |
gd_error_count() man page
A simple count of the number of errors (but not their nature) encountered by the library while working on a DIRFILE object can be obtained by calling
gd_error_string() man page
A descriptive string describing the success or failure of the previous GetData library call may be obtained by calling
If buffer is NULL, a pointer to a newly-allocated buffer containing the entire error string is returned. In this case, buflen is ignored. This string will be allocated on the caller's heap and should be deallocated by the caller when no longer needed.
gd_verbose_prefix() man page
When using GD_VERBOSE to automatically print library errors, the prefix to the output string may be set by calling
gd_invalid_dirfile() man page
A new, invalid DIRFILE object may be created by calling
gd_desync() man page
Desynchronisation of a loaded Dirfile from the metadata stored on disk (due to a third party modifying the Dirfile's metadata after load) may be detected by calling
Symbol | Meaning |
---|---|
GD_DESYNC_PATHCHECK | Ignore cached descriptors; check full paths; will detect symlink changes |
GD_DESYNC_REOPEN | If metadata have changed, re-read the dirfile from disk. |
Reading and Writing Data
Facilities exist in GetData for both reading and writing data. Dirfile offsets and ranges may be specified in frames, samples, or a combination of the two.
Data types are specified by arguments of type gd_type_t, which should be one of the symbols listed in Table 5. Signed integer types refer to two's-complement data, and floating point types refer to IEEE 754-1985 conformant data. Complex valued data consist of two consecutive floating point numbers, the first being the real part of the complex number and the second the imaginary part. This corresponds to the storage convention mandated by C99 which is also the storage convention mandated by FORTRAN-77 for complex valued data.
Symbol | Data type | Symbol | Data type |
---|---|---|---|
GD_UINT8 | Unsigned 8-bit integer | GD_INT8 | Signed 8-bit integer |
GD_UINT16 | Unsigned 16-bit integer | GD_INT16 | Signed 16-bit integer |
GD_UINT32 | Unsigned 32-bit integer | GD_INT32 | Signed 32-bit integer |
GD_UINT64 | Unsigned 64-bit integer | GD_INT64 | Signed 64-bit integer |
GD_FLOAT32 | 32-bit (single precision) floating point number | GD_FLOAT64 | 64-bit (double precision) floating point number |
GD_COMPLEX64 | A 64-bit (single precision) floating point complex number | GD_COMPLEX128 | A 128-bit (double precision) floating point complex number |
GD_STRING | Character string data | GD_NULL | The null type |
GD_UNKNOWN | An unknown type. Passing this type to the GetData library will always result in an error. |
gd_getdata()man page
Data may be fetched from a vector field in the dirfile (including metafields) with
gd_mplex_lookback()man page
To change how far GetData searches backwards for the initial value of a field when reading a MPLEX field, call
gd_get_constant()man page
The value of a CONST field (including metafields) may be fetched from the dirfile with
gd_constants()man page
The value of all CONST fields may be fetched from the dirfile with
gd_mconstants()man page
The value of all CONST metafields for a specified parent field may be fetched from the dirfile with
gd_get_carray()man page
A list of the value of all elements in a CARRAY field (including metafields) may be fetched from the dirfile with
gd_get_carray_slice()man page
A list of the value of a portion of a CARRAY field (including metafields) may be fetched from the dirfile with
gd_carrays()man page
The value of all CARRAY fields may be fetched from the dirfile with
size_t n;
void *d;
} gd_carray_t;
gd_mcarrays()man page
The value of all CARRAY metafields for a given parent field may be fetched from the dirfile with
gd_get_sarray()man page
A list of the value of all elements in a SARRAY field (including metafields) may be fetched from the dirfile with
gd_get_sarray_slice()man page
A list of the value of a portion of a SARRAY field (including metafields) may be fetched from the dirfile with
gd_sarrays()man page
The value of all SARRAY fields may be fetched from the dirfile with
gd_msarrays()man page
The value of all SARRAY fields for a given parent field may be fetched from the dirfile with
gd_get_string()man page
The value of a STRING field (including metafields) may be fetched from the dirfile with
On success, this function returns the actual length of the specified STRING field, including the terminating NUL character, regardless of whether the string was truncated when copied to the supplied buffer. On error, zero is returned.
gd_strings()man page
The value of all STRING fields may be fetched from the dirfile with
gd_mstrings()man page
The value of all STRING metafields for a specified parent field may be fetched from the dirfile with
gd_framenum_subset()man page
A reverse look-up may be performed on a portion of a field by calling
gd_framenum()man page
A reverse look-up may be performed on a field by calling
gd_putdata()man page
Data may be stored to a vector field (including metafields) in the dirfile with
gd_put_constant()man page
The value of a CONST field (including metafields) may be stored from the dirfile with
gd_put_carray()man page
An entire CARRAY field (including metafields) may be stored to the dirfile with
gd_put_carray_slice()man page
A portion of a CARRAY field (including metafields) may be stored to the dirfile with
gd_put_sarray()man page
An entire SARRAY field (including metafields) may be stored to the dirfile with
gd_put_sarray_slice()man page
A portion of a SARRAY field (including metafields) may be stored to the dirfile with
gd_put_string()man page
The value of a STRING field (including metafields) may be stored to the dirfile with
gd_seek()man page
The I/O pointer of a field may be repositioned by calling
gd_tell()man page
The current position of the I/O pointer of a field may be obtained by calling
gd_encoding_support()man page
Determining whether a given encoding is supported by the library can be done at runtime by calling
gd_open_limit()man page
Reading or writing to a large number of files can quickly cause the GetData library to run up against the operating system's open file limit. You can tell GetData to automatically manage the number of open files of a DIRFILE by calling:
Reading Metadata
All metadata are parsed when the dirfile is first opened, and retained until the dirfile is closed.gd_alloc_funcs() man page
An alternate memory manager may be specified for GetData's use by calling
gd_dirfilename() man page
The pathname of the dirfile may be obtained by calling
gd_nframes() man page
A count of the number of frames in the dirfile (i.e. the dirfile's length) may be obtained by calling
gd_nfragments() man page
The number of open format specification fragments may be obtained by calling
gd_encoding() man page
The encoding scheme for a given fragment may be obtained by calling
gd_endianness() man page
The byte sex for a given fragment may be obtained by calling
gd_frameoffset() man page
The frame offset for a given fragment may be obtained by calling
gd_protection() man page
The protection level for a given fragment may be obtained by calling
gd_fragmentname() man page
The pathname of one of the format specification fragments in a dirfile may be obtained by callinggd_parent_fragment() man page
The index of the format specification fragment which includes a given fragment may be obtained from a call togd_fragment_affixes() man page
The field name prefix and suffix associated with a fragment may be obtained by callinggd_linterp_tablename() man page
The pathname to the look-up table (LUT) associated with a LINTERP field may be obtained by callinggd_raw_filename() man page
The pathname of the binary file associated with a RAW field may be obtained by callinggd_spf() man page
The number of samples per frame for a given field may be obtained from a call to
On error, zero is returned.
gd_array_len() man page
The number of elements in a scalar field (CARRAY, CONST, or STRING field may be obtained from a call to
gd_entry() man page
The metadata for a particular field may be obtained from a call to
The gd_entry_t type is a structure whose available members depend on the field type described. A list of public members is presented in Table 6. The gd_entry_t may contain additional members than those listed. All other members are internal and not part of the public API. Internal members may change meaning, name, or availability without notice.
Type† | Member† | Field types | Meaning | |
---|---|---|---|---|
const char* | field | All | The name of this field. | |
gd_entype_t | field_type | The field type. One of the symbols listed below. | ||
unsigned | flags | A bitwise-or'd collection of the flags given in Table 7 below | .||
unsigned int | spf u.raw.spf |
RAW | The samples-per-frame of the binary data on disk. | |
gd_type_t | data_type u.raw.data_type |
The data type of the binary data on disk. | ||
const char* | in_fields[n] | LINCOM | n≤3 | The input field(s). The number of array elements initialised depends on the field type (and, in the case of LINCOM, the number of input fields). |
DIVIDE, MPLEX, MULTIPLY, WINDOW | n=2 | |||
BIT, LINTERP, PHASE, POLYNOM, RECIP, SBIT | n=1 | |||
int | n_fields u.lincom.n_fields |
LINCOM | The number of input fields, between one and three, inclusive. | |
double complex double |
cm[3] u.lincom.cm[3][2] |
Scale factors (slopes). The number of array elements initialised equals n_fields. The array m contains only the real part of cm. | ||
double | m[3] u.lincom.m[3] |
|||
double complex double |
cb[3] u.lincom.cb[3][2] |
Offset terms. The number of array elements initialised equals n_fields. The array b contains only the real part of cb. | ||
double | b[3] u.lincom.b[3] |
|||
int | poly_ord u.polynom.poly_ord |
POLYNOM | The order of the polynomial, between one and five, inclusive. | |
double complex double |
ca[6] u.polynom.ca[3][2] |
Co-efficients. The number of array elements initialised is one more than poly_ord. The array a contains only the real part of ca. | ||
double | a[6] u.polynom.a[3] |
|||
const char* | table u.linterp.table |
LINTERP | Pathname of the look-up table | |
int | bitnum u.bit.bitnum |
BIT, SBIT | The first bit of the input field (counting from zero). | |
int | numbis u.bit.numbits |
The width of the field (in bits). | ||
gd_int64_t | shift u.phase.shift |
PHASE | The phase shift (in samples). gd_int64_t is a 64-bit signed integer type. | |
double complex double |
cdividend u.recip.cdividend[2] |
RECIP | Dividend (a multiplicative factor). The dividend member contains only the real part of cdividend. | |
double | dividend u.recip.dividend |
|||
int | count_val u.mplex.count_val |
MPLEX | The value of the index vector when the input vector contains this field. | |
int | period u.mplex.period |
The nominal number of samples between successive samples of the field. | ||
gd_windop_t | windop u.window.windop |
WINDOW | The comparison operator used in the windowing | |
gd_triplet_t | threshold u.window.threshold |
The value to be compared against. This is a union containing an int64_t, a uint64_t, and a double. The member initialised depends on the value of windop. | ||
gd_type_t | const_type u.scalar.const_type |
CONST, CARRAY | The storage type of the field. | |
size_t | array_len u.scalar.array_len |
CARRAY | The number of elements in the field. | |
const char* | scalar[n] | BIT, LINCOM, MPLEX, PHASE, POLYNOM, RAW, RECIP, SBIT, WINDOW |
The field codes of CONST or CARRAY scalars used in the definition of this field. The number and meaning of initialised elements of this array is outlined in Table 8 below. | |
int | scalar_ind[n] | The element indices of CARRAY scalars used in the definition of this field. The number and meaning of initialised elements of this array is outlined in Table 8 below. For CONST scalars, this will be −1. |
All these correspond to fields in the field specification line of the format specification. Strings in the gd_entry_t are allocated on the caller's heap and should be freed either explicitly by the caller or else by passing the gd_entry_t to gd_free_entry_strings(). Variables of type gd_type_t represent data types, and will be one of the symbols listed in Table 5. Members for different field types may occupy the same physical memory.
The field type stored in the field_type member of the gd_entry_t will be one of the following symbols:
GD_NO_ENTRY (indicating an invalid field type), GD_BIT_ENTRY, GD_CARRAY_ENTRY, GD_CONST_ENTRY, GD_DIVIDE_ENTRY, GD_LINCOM_ENTRY, GD_LINTERP_ENTRY, GD_MPLEX_ENTRY, GD_MULTIPLY_ENTRY, GD_PHASE_ENTRY, GD_POLYNOM_ENTRY, GD_RAW_ENTRY, GD_RECIP_ENTRY, GD_SBIT_ENTRY, GD_STRING_ENTRY, GD_WINDOW_ENTRY, or GD_INDEX_ENTRY (the field type of the implicit INDEX field).
The flags member is a collection of the flags listed in Table 7 bitwise-or'd together:
Symbol | Meaning |
---|---|
GD_EN_CALC | The non-literal scalar parameter field codes (specified by the scalar member) have been successfully converted into numbers. |
GD_EN_COMPSCAL | At least one non-integer parameter (ca, cb, cm, cdividend) has a non-zero imaginary part. |
GD_EN_HIDDEN | The entry is hidden. |
If a CONST or CARRAY scalar field code is used as a parameter in the specification of a field, its field code will be listed in the scalar member. For CARRAY fields, the element of the field used will be stored in scalar_ind. For CONST scalars, scalar_ind will be −1. The value of this scalar field will be recorded in the corresponding parameter member. If that parameter was specified by a literal number, the corresponding element of the scalar array will be NULL, and the corresponding element of scalar_ind will be uninitialised. The number of initialised elements in the array, and their meanings depend on the type of the field described, as presented in Table 8. (The assignments to scalar_ind are identical.)
Field type | scalar[0] | scalar[1] | scalar[2] | scalar[3] | scalar[4] | scalar[5] |
---|---|---|---|---|---|---|
RAW | spf | — | ||||
BIT, SBIT | bitnum | numbits | — | |||
LINCOM | cm[i] | cb[i] | ||||
MPLEX | count | period | — | |||
PHASE | shift | — | ||||
POLYNOM | ca[i] | |||||
RECIP | cdividend | — | ||||
WINDOW | threshold | — |
Elements marked with a dash are uninitialised, and should not be accessed. In the case of a POLYNOM field, only the first poly_ord + 1 elements are initialised. In the case of a LINCOM field, only those elements of the scalar and scalar_ind arrays corresponding to the first n_fields elements of cm and cb will be initialised. In particular, for n_fields less than three, this means there will be uninitialised elements between the element corresponding to cm[n_fields - 1] and the element corresponding to cb[0].
For WINDOW fields, windop
is one of the symbols listed in Table 9. This value determines which
member of the gd_triplet_t union to
use for threshold, as also
explained in the table.
windop value
Meaning
threshold element
GD_WINDOP_EQ
check field equals threshold
int64_t threshold.i
GD_WINDOP_NE
check field does not equal
threshold
GD_WINDOP_SET
at least one bit set in threshold
is also set in the check field
uint64_t threshold.u
GD_WINDOP_CLR
at least one bit set in threshold
is not set in the check field
GD_WINDOP_GE
the check field is greater than or equal to
threshold
double threshold.r
GD_WINDOP_GT
the check field is strictly greater than
threshold
GD_WINDOP_LE
the check field is less than or equal to
threshold
GD_WINDOP_LT
the check field is strictly less than
threshold
GD_WINDOP_UNK
An invalid value
—
gd_free_entry_strings() man page
Strings in gd_entry_t objects may be deallocated by calling
gd_entry_type() man page
The field type of a given field may be obtained from call to
gd_fragment_index() man page
The index of the fragment which defines a given field or alias may be obtained by calling
gd_native_type() man page
The native data type of a field may be obtained by calling
gd_bof() man page
The location of the beginning-of-field marker for a given field may be obtained from a call togd_eof() man page
The location of the end-of-field marker for a given field may be obtained from a call togd_validate() man page
Whether a given field code is valid or not may be checked by calling
gd_hidden() man page
Whether a given field code is hidden or not may be checked by calling
gd_aliases() man page
A list of aliases of a given field code may be obtained by calling
gd_naliases() man page
The number of aliases of a given field code may be obtained by calling
gd_strtok() man page
To tokenise a string using the GetData parser, call
Counting Entries
These functions provide counts of fields in the dirfile. Similar list functions exist to get the actual field names. All these functions return zero on error.
gd_nentries() man page
A count of the field entries in the dirfile satisfying various criteria may be obtained by calling
Symbol | Meaning |
---|---|
GD_ALL_ENTRIES | (= 0) Count entries of all types |
GD_ALIAS_ENTRIES | Count only aliases. This is the only way to get a count including aliases which do not point to valid field codes. |
GD_SCALAR_ENTRIES | Count only scalar field types (CONST, CARRAY, STRING) |
GD_VECTOR_ENTRIES | Count only vector field types (all field types except the scalar field types listed above) |
The flags parameter should be zero or more of the flags in Table 11, bitwise or'd together.
Symbol | Meaning |
---|---|
GD_ENTRIES_HIDDEN | Include hidden entries in the count: normally hidden entries are skipped |
GD_ENTRIES_NOALIAS | Exclude aliases from the count |
The other counting functions are merely special cases of this function.
gd_nfields() man page
A count of the number of fields in the dirfile may be obtained by calling
gd_nvectors() man page
A count of the number of vector fields (that is all field types except CONST, CARRAY, and STRING) in the dirfile may be obtained by calling
gd_nfields_by_type() man page
A count of the number of fields of a specified field type in the dirfile may be obtained by calling
gd_nmfields() man page
A count of the number of metafields in the dirfile for a particular parent field may be obtained by calling
gd_nmvectors() man page
A count of the number of vector metafields (that is all field types except CONST, CARRAY, and STRING) for a particular parent field in the dirfile may be obtained by calling
gd_nmfields_by_type() man page
A count of the number of metafields for a particular parent field of a specified field type in the dirfile may be obtained by calling
Listing Entries
These functions provide lists of field names in the dirfile. Similar counting functions exist to get the number of fields. All these functions return a pointer to an array of strings allocated by the library. The list is terminated by a NULL pointer. You should not free the list: it will be freed when gd_close() is called. The pointer returned is guaranteed to be valid only until the same list function is called again; however the list may be out-of-date, if the dirfile metadata has been modified since the call was made. On error these functions return NULL.
gd_match_entries() man page
A list of entries in the dirfile satisfying various criteria may be obtained by calling
Symbol | Meaning |
---|---|
GD_REGEX_PCRE | Use the Perl-Compatible Regular Expression library instead of the POSIX Regex library for regular expression matching. |
GD_REGEX_CASELESS | Do case-insensitive matching. These two symbols are synonyms. |
GD_REGEX_ICASE | |
GD_REGEX_JAVASCRIPT | (PCRE only:) Use Javascript-compatible regular expression grammar. |
GD_REGEX_UNICODE | (PCRE only:) Use UTF-8. |
On success, the list of matched entries is returned in *entries, the memory for which is managed by GetData. This function returns the number of entries matched, or a negative error code on error.
gd_entry_list() man page
A list of entries in the dirfile satisfying various criteria may be obtained by calling
The remaining list functions are merely special cases of this function.
gd_field_list() man page
A list of fields in the dirfile may be obtained by calling
gd_vector_list() man page
A list of vector fields (that is all field types except CONST, CARRAY, and STRING) in the dirfile may be obtained by calling
gd_field_list_by_type() man page
A list of fields of a specified type in the dirfile may be obtained by calling
gd_mfield_list() man page
A list of fields in the dirfile may be obtained by calling
gd_mvector_list() man page
A list of vector fields (that is all field types except CONST, CARRAY, and STRING) in the dirfile may be obtained by calling
gd_mfield_list_by_type() man page
A list of metafields of a specified type for a particular parent field in the dirfile may be obtained by calling
Modifying Fragment Metadata
The following functions allow you to modify the fragment metadata of an open dirfile. Changes made to metadata aren't actually written to disk until gd_metaflush() or gd_close() is called.
gd_include_affix() man page
A fragment may be added to an open dirfile by callinggd_include_ns() man page
A fragment with no affixes, but a namespace may be added to an open dirfile by callinggd_include() man page
If no namespace, prefix or suffix are required, a fragment may be added to an open dirfile by callinggd_uninclude() man page
A fragment may be removed from an open dirfile by callinggd_reference() man page
The reference field for the dirfile may be modified by calling
gd_alter_affixes() man page
The field name prefix and/or suffix of a fragment may be modified by callinggd_fragment_namespace() man page
The root namespace of a fragment may be read and/or modified by callinggd_alter_encoding() man page
The encoding scheme of RAW fields in a given fragment may be changed by callinggd_alter_endianness() man page
The byte sex of RAW fields in a given fragment may be changed by callingIf recode is non-zero, the associated binary files will be byte-swapped, if the current encoding scheme requires it. This function returns zero on success or a negative error code on error.
Expression | Meaning |
---|---|
0 (zero) | The byte sex should be the native endianness of the host, whichever that may be. |
GD_BIG_ENDIAN | The byte sex should be big endian. |
GD_LITTLE_ENDIAN | The byte sex should be little endian. |
GD_BIG_ENDIAN | GD_LITTLE_ENDIAN | The byte sex should be the opposite of the native endianness of the host, whichever that may be. |
gd_alter_frameoffset() man page
The frame offset of RAW fields in a given fragment may be changed by callinggd_alter_protection() man page
The protection level of a given fragment may be changed by callingAdding or Deleting Entries
The following functions add to or delete entries from the dirfile. These functions return zero on success, or a negative error code on error.
gd_add() man page
A field may be added to a dirfile by callinggd_add_spec() man page
A field may be added to a dirfile by callinggd_add_bit() man page
A BIT field may be added to a dirfile by callinggd_add_carray() man page
A CARRAY field may be added to a dirfile by callinggd_add_const() man page
A CONST field may be added to a dirfile by callinggd_add_divide() man page
A DIVIDE field may be added to a dirfile by callinggd_add_indir() man page
An INDIR field may be added to a dirfile by callinggd_add_lincom()man page
A LINCOM field with purely real parameters may be added to a dirfile by callinggd_add_clincom() man page
A LINCOM field with complex valued parameters may be added to a dirfile by callingIf using the C89 API, purely real double array pointers can be passed to this function instead of double complex arrays, without change in behaviour.
gd_add_linterp() man page
A LINTERP field may be added to a dirfile by callinggd_add_mplex()man page
A MPLEX field may be added to a dirfile by callinggd_add_multiply()man page
A MULTIPLY field may be added to a dirfile by callinggd_add_phase()man page
A PHASE field may be added to a dirfile by callinggd_add_polynom()man page
A POLYNOM field with purely real parameters may be added to a dirfile by callinggd_add_cpolynom()man page
A POLYNOM field with complex valued parameters may be added to a dirfile by callingIf using the C89 API, purely real double array pointers can be passed to this function instead of double complex arrays, without change in behaviour.
gd_add_raw()man page
A RAW field may be added to a dirfile by callinggd_add_recip()man page
A RECIP field with purely real dividend may be added to a dirfile by callinggd_add_crecip()man page
A RECIP field with complex valued dividend may be added to a dirfile by callingTo add a metafield, either specify it's full "parent/child" field code, or else use gd_madd_crecip().
gd_add_sarray() man page
A SARRAY field may be added to a dirfile by callinggd_add_sbit()man page
An SBIT field may be added to a dirfile by callinggd_add_sindir() man page
A SINDIR field may be added to a dirfile by callinggd_add_string()man page
A STRING field may be added to a dirfile by callinggd_add_window()man page
A MPLEX field may be added to a dirfile by callinggd_add_alias()man page
An alias may be added to a dirfile by callinggd_madd()man page
A metafield may be added to a dirfile by callinggd_madd_spec()man page
A metafield may be added to a dirfile by callinggd_madd_bit()man page
A BIT metafield may be added to a dirfile by callinggd_madd_carray()man page
A CARRAY metafield may be added to a dirfile by callinggd_madd_const()man page
A CONST metafield may be added to a dirfile by callinggd_madd_divide()man page
A DIVIDE metafield may be added to a dirfile by callinggd_madd_indir()man page
An INDIR metafield may be added to a dirfile by callinggd_madd_lincom()man page
A LINCOM metafield with purely real parameters may be added to a dirfile by callinggd_madd_clincom()man page
A LINCOM metafield with complex valued parameters may be added to a dirfile by callingIf using the C89 API, purely real double array pointers can be passed to this function instead of double complex arrays, without change in behaviour.
gd_madd_linterp()man page
A LINTERP metafield may be added to a dirfile by callinggd_madd_mplex()man page
A MPLEX metafield may be added to a dirfile by callinggd_madd_multiply()man page
A MULTIPLY metafield may be added to a dirfile by callinggd_madd_phase()man page
A PHASE metafield may be added to a dirfile by callinggd_madd_polynom()man page
A POLYNOM metafield with purely real parameters may be added to a dirfile by callinggd_madd_cpolynom()man page
A POLYNOM metafield with complex valued parameters may be added to a dirfile by callingIf using the C89 API, purely real double array pointers can be passed to this function instead of double complex arrays, without change in behaviour.
gd_madd_recip()man page
A RECIP metafield with purely real dividend may be added to a dirfile by callinggd_madd_crecip()man page
A RECIP metafield with complex valued dividend may be added to a dirfile by callinggd_madd_sarray()man page
A SARRAY metafield may be added to a dirfile by callinggd_madd_sbit()man page
An SBIT metafield may be added to a dirfile by callinggd_madd_sindir()man page
A SINDIR metafield may be added to a dirfile by callinggd_madd_string()man page
A STRING field may be added to a dirfile by callinggd_madd_window()man page
A WINDOW metafield may be added to a dirfile by callinggd_madd_alias()man page
A metafield alias field may be added to a dirfile by callinggd_delete()man page
An entry may be deleted from a dirfile by callingSymbol | Meaning |
---|---|
GD_DEL_DATA | Also delete the binary file associated with a RAW field. |
GD_DEL_DEREF | Dereference a CONST field used as a field parameter. |
GD_DEL_FORCE | Delete the field even if it is used as an input to other fields. |
GD_DEL_META | Also delete metafields attached to the field. |
Modifying Field Metadata
The following functions modify the parameters of a field. These functions return zero on success, or a negative error code on error.
gd_alter_entry()man page
A field may modified by callinggd_alter_spec()man page
A field may be modified by callinggd_malter_spec()man page
A metafield may be modified by callinggd_alter_bit()man page
A BIT field or metafield may be modified by callinggd_alter_carray()man page
A CARRAY field may be modified by callinggd_alter_const()man page
A CONST field may be modified by callinggd_alter_divide()man page
A DIVIDE field may be modified by callinggd_alter_indir()man page
An INDIR field may be modified by callinggd_alter_lincom()man page
A LINCOM field may be modified with purely real parameters by callinggd_alter_clincom()man page
A LINCOM field may be modified with complex valued parameters by callingIf using the C89 API, purely real double array pointers can be passed to this function instead of double complex arrays, without change in behaviour.
gd_alter_linterp()man page
A LINTERP field may be modified by callinggd_alter_mplex()man page
A MPLEX field may be modified by callinggd_alter_multiply()man page
A MULTIPLY field may be modified by callinggd_alter_phase()man page
A PHASE field may be modified by callinggd_alter_polynom()man page
A POLYNOM field may be modified with purely real parameters by callinggd_alter_cpolynom()man page
A POLYNOM field may be modified with complex valued parameters by callingIf using the C89 API, purely real double array pointers can be passed to this function instead of double complex arrays, without change in behaviour.
gd_alter_raw()man page
A RAW field may be modified by callinggd_alter_recip()man page
A RECIP field may be modified with purely real parameters by callinggd_alter_crecip()man page
A RECIP field may be modified with complex valued parameters by callinggd_alter_sarray()man page
A SARRAY field may be modified by callinggd_alter_sbit()man page
An SBIT field or metafield may be modified by callinggd_alter_sindir()man page
A SINDIR field may be modified by callinggd_alter_window()man page
A WINDOW field may be modified by callinggd_move()man page
An entry may be moved from one format specification fragment to another by callinggd_rename()man page
The name of a field or alias may be changed by callingSymbol | Meaning |
---|---|
GD_REN_DANGLE | Don't update ALIAS entries, but turn them into dangling aliases |
GD_REN_DATA | if renaming a RAW field, also rename the data file on disk |
GD_REN_FORCE | instead of having the call fail, just skip updating field codes which would become invalid |
GD_REN_UPDB | update references to the renamed field to use its new name |
gd_hide()man page
A field code my be hidden (excluding it from the list and count functions) by callinggd_unhide()man page
A field code my be unhidden (including it in the list and count functions) by callingThe Parser Callback Function
The response of the parser to syntax errors in the dirfile format specification is typically to abort opening of the dirfile on encountering the first syntax error, setting the error GD_E_FORMAT. This behaviour can be changed by the caller by providing a parser callback function when calling gd_cbopen().
The prototype of this function is:
Type | Member | Meaning |
---|---|---|
const DIRFILE* | dirfile | A pointer to a DIRFILE object suited only for passing to gd_error() or gd_error_string(). This pointer is valid only until the callback function returns. |
int | suberror | A numerical code indicating the type of syntax error encountered. See the gd_cbopen manual page for a full list of possible values. |
const char* | filename | The filename and line number on which the syntax error was found. The first line in a fragment is numbered one. |
int | linenum | |
char* | line | A string buffer of at least GD_MAX_LINE_LENGTH characters containing a NUL-terminated copy of the offending line. The line may be freely modified by the caller, which can then ask the parser to rescan it (see Table 17 below). |
The callback function should return one of the integer symbols listed in Table 17, which tells the parser what to do following the return of the callback.
Symbol | Parser action |
---|---|
GD_SYNTAX_ABORT | The parser will immediately abort, and raise GD_E_FORMAT. This is the default behaviour of the parser if no callback is provided by the caller. |
GD_SYNTAX_CONTINUE | The parser will skip the offending line and continue parsing the fragment. However, when it finishes, it will still cause the operation to fail and set GD_E_FORMAT, even if no further syntax errors are found. (This can be used to collect a list of all syntax errors, without having them corrected.) |
GD_SYNTAX_IGNORE | The parser will ignore the offending line completely, and carry on as if the line didn't exist. If no other syntax errors are detected, the parser will complete successfully (although there may be other problems later, if the line containing the syntax error was important). |
GD_SYNTAX_RESCAN | The parser will replace the offending line with the contents of pdata.line, which it assumes the callback has modified. If a syntax still exists in the corrected line, the callback will be called again. |
NB: This mechanism handles only syntax errors. Other problems may cause the parser to abort early, without calling the registered callback function.
Unclean Database Recovery
In certain exceptional circumstances, the functions gd_alter_encoding(), gd_alter_endianness(), and gd_alter_frameoffset(), may result in the dreaded GD_E_UNCLEAN_DB library error, indicating that the call has left the open dirfile database in an "unclean" state. This section outlines procedures to recover such an unclean dirfile. It is duplicated by the document unclean_database_recovery.txt included in the GetData distribution and installed by default in the ${prefix}/doc/getdata/ directory,
Preamble
If you are not interested in the mechanism of how an unclean database comes about, you may skip directly to the next section.
The GD_E_UNCLEAN_DB error may be returned by the following functions when they are asked to modify binary data:
If they aren't asked to modify binary data, GD_E_UNCLEAN_DB will never be returned. In an abstract sense, all these functions modify binary data with the following procedure:- copies are made of all the binary data files, after modification, for the fragment
-
- If an error occurred in step one, the binary file copies are deleted.
- If no error occurred in step one, the binary data files are moved into place. In the case of gd_alter_endianness() and gd_alter_frameoffset(), the files are moved over top of the old files. In the case of gd_alter_encoding(), the files have a new name, due to the differing extension, so after moving the new files, the old files are deleted.
Steps 1 and 2a never produces GD_E_UNCLEAN_DB. Step 2b will produce GD_E_UNCLEAN_DB if moving the new files fails, or if the delete of the old file fails.
A move is accomplished through the rename(2) system call. A delete is accomplished through through the unlink(2) system call. See their man pages for reasons they might fail.
An unclean database will suffer from one or more of the following problems:
- Binary data files with incorrect names.
- Both pre- and post-modification copies of the binary data.
Mitigation
Once GD_E_UNCLEAN_DB is encountered by an application, the open DIRFILE object should be closed using gd_close() or gd_discard() since further use of the dirfile may corrupt the database. The GetData library encourages this behaviour by marking the database as invalid, which will cause most calls on the dirfile to fail. If the pathname of the fragment in which the error occurred is not known (possibly because GD_ALL_FRAGMENTS was used), gd_error_string() may be called before closing the dirfile to get the pathname of the affected fragment.
Recovery
The procedure for database recovery should be as follows:1. Determination of affected fragment
Before recovery can be accomplished, the unclean fragment must be determined. The easiest way to do this is to call gd_error_string() before closing the dirfile. The error string for the GD_E_UNCLEAN_DB error contains the path to the unclean fragment.
If this has not been done, the fragment may be locatable by searching for temporary or duplicate files (see below) which haven't been cleaned up.
2. Fragment preparation
Edit the unclean fragment with a text editor. The error will have caused GetData to not update the fragment with the new encoding/endianness/frame offset. Update this now.
Make a list of RAW fields defined in the unclean fragment.
3. RAW field classification
Now go through the RAW field list. Each RAW field should fall into one of these classes:
- Class A
- A temporary file exists with a name of the form <field_name>_XXXXXX where the XXXXXX represents an arbitrary set of six characters. In this case there should also be a file with the proper binary file name, ie. <field_name>, possibly appended with an encoding extension.
- Class B
- No temporary file exists, the only file is the one with the propper binary file name (with the new encoding suffix, if appropriate).
4. RAW Field Cleaning
Cleaning is a simple procedure: the temporary file contains the newly modified binary data, but it has the wrong name. The correctly named binary file contains the old unmodified data.
If the encoding was not changed (ie. GD_E_UNCLEAN_DB was returned by a function other than gd_alter_encoding()), the field should be cleaned by simply moving the temporary file over top of the existing binary file with the correct name.
If the encoding was changed (ie. GD_E_UNCLEAN_DB was returned by gd_alter_encoding()), the field should be cleaned by renaming the temporary file and replacing the _XXXXXX part with the correct encoding extension (see Dirfile Encodings). The old file should then be deleted.
The important point here is: the temporary file contains the correct data. It must be kept. This procedure should be repeated for each RAW file in Class A.
Checking
Once the dirfile has been cleaned, it should be checked by opening it read-only and attempting to read each of the RAW fields in the (formerly) unclean fragment. If the procedure has been performed correctly, the expected data should be returned.
If the data returned appear corrupted, it is possible that the old data file was not deleted; recheck the dirfile. If an I/O error occurs, it is possible that the replacement file has an incorrect name or permissions; again, recheck the dirfile.