gd_open, gd_cbopen—open or create a Dirfile
#include <getdata.h>
DIRFILE* gd_open(const char *dirfilename, unsigned long flags);
DIRFILE* gd_cbopen(const char *dirfilename, unsigned long flags, gd_parser_callback_t sehandler, void *extra);
The gd_cbopen() function opens or creates the dirfile specified by dirfilename, returning a DIRFILE object associated with it. Opening a dirfile will cause the library to read and parse the dirfile's format specification (see dirfile-format(5)).
If not NULL, sehandler should be a pointer to a function which will be called whenever a syntax error is encountered during parsing the format specification. Specify NULL for this parameter if no callback function is to be used. The caller may use this function to correct the error or modify the error handling of the format specification parser. See The Callback Function section below for details on this function. The extra argument allows the caller to pass data to the callback function. The pointer will be passed to the callback function verbatim.
The gd_open() function is equivalent to gd_cbopen(), with sehandler and extra set to NULL.
The flags argument should include one of the access modes: GD_RDONLY (read-only) or GD_RDWR (read-write), and may also contain zero or more of the following flags, bitwise-or'd together:
These flag only set the default endianness, and will be overridden when an /ENDIAN directive specifies the byte sex of RAW fields, unless GD_FORCE_ENDIAN is also specified.
On every platform, one of these flags (GD_NOT_ARM_ENDIAN on all but middle-ended ARM systems) indicates the native behaviour of the platform. That symbol will equal zero, and may be omitted.
Unlike the ARM endianness flags above, neither of these symbols is ever zero. Specifying both these flags together will cause the library to assume that the endianness of the data is opposite to that of the native architecture, whatever that might be.
These flag only set the default endianness, and will be overridden when an /ENDIAN directive specifies the byte sex of RAW fields, unless GD_FORCE_ENDIAN is also specified.
The directory will have have mode S_IRWXU | S_IRWXG | S_IRWXO (0777), modified by the caller's umask value (see umask(2)). The format file will have mode S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH (0666), also modified by the caller's umask. The owner of the dirfile directory and format file will be the effective user ID of the caller. Group ownership follows the rules outlined in mkdir(2).
No indication is provided to indicate whether a duplicate field has been discarded. If finer grained control is required, the caller should handle GD_E_FORMAT_DUPLICATE suberrors itself with an appropriate callback function.
Truncation occurs by deleting every regular file and symlink in the specified directory, whether the files were referred to by the dirfile before truncation or not. Accordingly, this flag should be used with caution. Unless GD_TRUNCSUB is also specified, subdirectories are left untouched. Notably, this operation does not consider directories used in /INCLUDE directives. If the dirfile does not exist, this flag is ignored.
Those flags which affect the operation of the library beyond this call itself may be modified later using the gd_flags(3) function.
The flags argument may also be bitwise or'd with one of the following symbols indicating the default encoding scheme of the dirfile. Like the endianness flags, the choice of encoding here is ignored if the encoding is specified in the dirfile itself, unless GD_FORCE_ENCODED is also specified. If none of these symbols is present, GD_AUTO_ENCODED is assumed, unless the gd_cbopen() call results in creation or truncation of the dirfile. In that case, GD_UNENCODED is assumed. See dirfile-encoding(5) for details on dirfile encoding schemes.
The latest Dirfile Standards Version which this release of GetData understands is provided in the preprocessor macro GD_DIRFILE_STANDARDS_VERSION defined in getdata.h. GetData is able to open and parse any dirfile which conforms to this Standards Version, or to any earlier Version. The dirfile-format(5) manual page lists the changes between Standards Versions.
The GetData parser can operate in two modes: a permissive mode, in which much non-Standards-compliant syntax is allowed, and a pedantic mode, in which the parser adheres strictly to the Standards. The mode made change during the parsing of a dirfile. If GD_PEDANTIC is passed to gd_cbopen(), the parser will start parsing the format specification in pedantic mode, otherwise it will start in permissive mode.
Permissive mode is provided primarily to allow GetData to be used on dirfiles which conform to no single Standard, but which were accepted by the GetData parser in previous versions. It is notably lax regarding reserved field names, and field name characters, the mixing of old and new data type specifiers, and generally ignores the presence of /VERSION directives. In read-write mode, permissive mode should be used with caution, as it can cause unintentional corruption of dirfile metadata on write, if the heuristics in the parser incorrectly guessed the intention of non-compliant syntax. In permissive mode, actual syntax errors are still reported as such.
In pedantic mode, the parser conforms to one specific Standards Version. This target version may change any number of times in the course of scanning a single format specification. If invoked using the GD_PEDANTIC flag, the parser will start in pedantic mode with a target version equal to GD_DIRFILE_STANDARDS_VERSION. Whenever a /VERSION directive is encountered in the format specification, the target version is changed to the Standards Version specified. When encountering a /VERSION directive in permissive mode, the parser will switch to pedantic mode, unless the GD_PERMISSIVE flag was passed to gd_cbopen(), in which case no mode switch will take place.
Independent of the mode of the parser when parsing the format specification, GetData will calculate a list of Standards Versions to which the parsed metadata conform to. The gd_dirfile_standards(3) function can provide this information, and also specify the desired Standards Version for writing format metadata back to disk.
The caller-supplied sehandler function is called whenever the format specification parser encounters a syntax error (i.e. whenever it would return the GD_E_FORMAT error). This callback may be used to correct the error, or to tell the parser how to recover from it.
This function should take two pointers as arguments, and return an int:
int sehandler(gd_parser_data_t *pdata, void *extra);
The extra parameter is the pointer supplied to gd_cbopen(), passed verbatim to this function. It can be used to pass caller data to the callback. GetData does not inspect this pointer, not even to check its validity. If the caller needs to pass no data to the callback, it may be NULL.
The gd_parser_data_t type is a structure with at least the following members:
typedef struct {
const DIRFILE* dirfile;
int suberror;
int linenum;
const char* filename;
char* line;
size_t buflen;
...
} gd_parser_data_t;
The pdata->dirfile member will be a pointer to a DIRFILE object suitable only for passing to gd_error_string(). Notably, the caller should not assume this pointer will be the same as the pointer eventually returned by gd_cbopen(), nor that it will be valid after the callback function returns.
The pdata->suberror parameter will be one of the following symbols indicating the type of syntax error encountered:
pdata->filename and pdata->linenum members contains the pathname of the fragment and line number where the syntax error was encountered. The first line in a fragment is line one.
The pdata->line member contains a copy of the line containing the syntax error. This line may be freely modified by the callback function. It will then be reparsed if the callback function returns the symbol GD_SYNTAX_RESCAN (see below). The size of the memory buffer, which may be greater than the length of the actual string, is provided in pdata->buflen, and space is available for at least GD_MAX_LINE_LENGTH bytes.
If the callback function returns GD_SYNTAX_RESCAN, then a different buffer, which may be larger, may be used to hold the new string, by assigning a pointer to the new buffer to pdata->line. This buffer will be deallocated by the library using the free function specified through gd_alloc_funcs(3), or else free(3) by default. Do not deallocate the original buffer passed to the callback through pdata->line: it, too, will be deallocated by the library.
The callback function should return one of the following symbols, which tells the parser how to subsequently handle the error:
Note: the line is not corrected on disk; however, the caller may subsequently correct the fragment on disk by calling gd_rewrite_fragment(3).
The callback function handles only syntax errors. The parser may still abort early, if a different kind of library error is encountered. Furthermore, although a line may contain more than one syntax error, the parser will only ever report one syntax error per line, even if the callback function returns GD_SYNTAX_CONTINUE.
A call to gd_cbopen() or gd_open() always returns a pointer to a newly allocated DIRFILE object, except in instances when it is unable to allocate memory for the DIRFILE object itself, in which case it will return NULL. The DIRFILE object is an opaque structure containing the parsed dirfile metadata.
If an error occurred, these functions will store a negative-valued error code in the returned DIRFILE, which may be retrieved by a subsequent call to gd_error(3). Possible error codes are:
A DIRFILE which is returned from a failed open is flagged as invalid, meaning most functions it is passed to will faill with the error GD_E_BAD_DIRFILE. A descriptive error string for the error may be obtained by calling gd_error_string(3).
When no longer needed, the caller should de-allocate any returned DIRFILE object by calling gd_close(3), or gd_discard(3), even if the open failed.
When working with dirfiles conforming to Standards Versions 4 and earlier (before the introduction of the /ENDIAN directive), GetData assumes the dirfile has native byte sex, even though, officially, these early Standards stipulated data to be little-endian. This is necessary since, in the absence of an explicit /VERSION directive, it is often impossible to determine the intended Standards Version of a dirfile, and the current behaviour is to assume native byte sex for modern dirfiles lacking /ENDIAN. To read an old, little-ended dirfile on a big-ended platform, an /ENDIAN directive should be added to the format specification, or else GD_LITTLE_ENDIAN should be specified by the caller.
GetData's parser assumes it is running on an ASCII-compatible platform. Format specification parsing will fail gloriously on an EBCDIC platform.
The dirfile_open() function appeared in GetData-0.3.0. The only supported flags were GD_BIG_ENDIAN, GD_CREAT, GD_EXCL, GD_FORCE_ENDIAN, GD_LITTLE_ENDIAN, GD_PEDANTIC, GD_RDONLY, GD_RDWR, and GD_TRUNC.
The GD_AUTO_ENCODED, GD_FORCE_ENCODING, GD_SLIM_ENCODED, GD_TEXT_ENCODED, GD_UNECODED, and GD_VERBOSE flags appeared in GetData-0.4.0.
The dirfile_cbopen() function and the GD_BZIP2_ENCODED, GD_GZIP_ENCODED, and GD_IGNORE_DUPS flags appeared in GetData-0.5.0.
The GD_PRETTY_PRINT and GD_LZMA_ENCODED flags appeared in GetData-0.6.0.
In GetData-0.7.0 these functions were renamed to gd_open() and gd_cbopen(). The GD_ARM_ENDIAN, GD_NOT_ARM_ENDIAN, and GD_PERMISSIVE flags also appeared in this release.
The GD_SIE_ENCODED, GD_TRUNCSUB, GD_ZZIP_ENCODED, and GD_ZZSLIM_ENCODED flags appeared in GetData-0.8.0.
The GD_FLAC_ENCODED flag appeared in GetData-0.9.0.
gd_alloc_funcs(3), gd_close(3), gd_dirfile_standards(3), gd_discard(3), gd_error(3), gd_error_string(3), gd_flags(3), gd_getdata(3), gd_include(3), gd_parser_callback(3), gd_verbose_prefix(3), dirfile(5), dirfile-encoding(5), dirfile-format(5)