Introduction
Users of GetData before the release of version 0.3 (August 2008) will notice a fairly substantial change in the API starting with version 0.3.0 as compared with older version. Older versions of this library (hereafter referred to as the "legacy API") suffered from thread safety issues, and lacked LFS (large file) support. A new API has been created to address these issues.
The legacy API has been re-implemented in the library and programs that used it should still work without modification after linking to the latest version of GetData, with these two small provisos explained further the comparison section below:
- Internal library data can no-longer be obtained by calling GetFormat.
- Error codes have changed.
The New API
The new API separates the opening of a Dirfile from reading or writing to it. Where in the old API one would use:
size_t n_read = gd_getdata(D, "field", 1, 0, 1, 0, GD_INT32, data);
The new API is fully documented in the included man pages and in the Using GetData document. A translation example from the legacy API to the new API is present as at the end of this document.
The DIRFILE Object
In the legacy API, dirfiles were referred to by their path name. In the new API, after the dirfile has been opened, it is referenced instead by passing a pointer to a DIRFILE object. Further, where the legacy API was passed an integer pointer to store the error code, this is now stored in the DIRFILE object itself and may be accessed at any time by calling gd_error() As a side-effect of this change, the error value itself, and the descriptive error string which can be generated by the library, is now local to a particular instance of a particular dirfile, rather than being global across the library.
Once a DIRFILE object has been created by a call to gd_open, all subsequent operations on the dirfile operate on this object. Once the program is finished with the dirfile, the object can be destroyed, and all open file handles closed, with the call:
Data Types
Partially in order to fully support large files (>2 GB) as defined by the LFS, a consistent data type structure is used in the new API:
- database offsets and database sizes are of type off_t
- object sizes and counts of items read are of type size_t
- samples-per-frame are of type gd_spf_t, an unsigned 16-bit integer type
- data type specifiers (formerly of type char) are now of type gd_type_t, which is defined in
getdata.h. Any time this type is needed, one of the following
symbols, also defined in getdata.h, should be used:
GD_NULL, GD_UINT8, GD_INT8, GD_UINT16, GD_INT16, GD_UINT32, GD_INT32, GD_UINT64, GD_INT64, GD_FLOAT32, GD_FLOAT62, GD_COMPLEX64, GD_COMPLEX128, GD_UNKNOWN.
Using a legacy API single character type specifier where this type is needed will result in the library error GD_E_BAD_TYPE.
The legacy API continues to use int for offsets, sizes, and counts, which prevents it from supporting large files.
Largefile Support
When built on a platform using the GNU C Library, or another compatible C Library, the new GetData API will respect the feature test macros _LARGEFILE64_SOURCE and _FILE_OFFSET_BITS affecting largefile (>2 GB) support. If one or the other of these are to be used, they must be defined before including getdata.h or any Standard C Library header file.
The first of these, _LARGEFILE64_SOURCE, if defined before including getdata.h, will enable the obsolete, transitional largefile extensions defined by the LFS. This will enable explicit support for large files by defining the 64-bit explicit type off64_t, and result in GetData defining the explicitly 64-bit interfaces gd_getdata64, gd_putdata64, and gd_nframes64. This macro is largely obsolete, and using _FILE_OFFSET_BITS is preferred, if supported.
The second macro, _FILE_OFFSET_BITS, determines the size of off_t. If not defined, or defined to 32, off_t will be a 32-bit type. If, instead, this macro is defined to 64, off_t will be the largefile supporting 64-bit type, and calls to gd_getdata, gd_putdata, &c. will intrinsically have largefile support. On 64-bit systems this macro has no effect, since a 64-bit off_t is used all the time.
If your system uses the GNU C Library, the feature_test_macros(7) man page will provide further explanation. On systems where these macros are unsupported, the gd_getdata64, &c. interfaces will never be defined, and the size of off_t will be system dependent. In this case, GetData will follow the default largefile behaviour of the underlying platform.
If you build GetData against a C Library that lacks largefile support, the GetData library will not support large files either, no matter what you do with these macros.
API Comparison
The following table lists correspondences between the legacy API and the new API. Legacy API support in this version of GetData is a reimplementation of that API based on the new API. As a result, one function (GetFormat) and several error codes in the legacy API (see below) have changed slightly, but bugs fixed in the internals of the library for the new API will apply to the legacy API as well. Furthermore, the new API has additional functionality not indicated here.
New | Legacy | Notes |
---|---|---|
gd_close | — | Closes a dirfile. The legacy API contained no facility to do this. |
gd_flush | — | Flushes (i.e. syncs and closes binary files associated with) a
dirfile field, or the whole dirfile. The legacy API contained no
facility to do this, however, several extended versions of GetData
did contain such facilities. TK's libdirfile/b2klib
contained:
|
gd_open | — | Opens or creates a dirfile. A dirfile open happened implicitly in the legacy API, the first time it was accessed. The legacy API had no facility to create new dirfiles. |
gd_getdata | GetData | Fetch data from a dirfile. Behaviour is the same. No facilities exist in the legacy API to retrieve scalar fields. |
gd_error_string | GetDataErrorString | Returns a descriptive error string. Behaviour is the same. |
GD_ERROR_CODES | In the legacy API, this was a global array of error messages. The new API supports no such array; callers should use the gd_error_string function instead. (This is good advice for users of the legacy API as well.) Some error codes are specific to the new API. These error codes may not have a corresponding entry in this array. | |
gd_entry | GetFormat | Returns the metadata for one field. The legacy API returned a structure containing all the dirfile metadata. The legacy API's re-implementation of this function still returns this structure, but only those members corresponding to public members of the gd_entry_t object will be properly initialised. Furthermore, RAW data types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 cannot be represented by the legacy API. The legacy API will incorrectly report fields of these types to have the NULL ('n') type. Furthermore, since the legacy API does not support POLYNOM, SBIT, DIVIDE, and RECIP fields, these are listed in the legacy API's structure as LINCOM, BIT, MULTIPLY, and LINCOM fields, respectively. Scalar fields are completely ignored by the legacy API. |
gd_field_list | List the fields in a dirfile. The structure returned by the legacy API contained lists of the fields in the dirfile, broken up by field type. | |
gd_nfields | Report the number of fields in a dirfile. The legacy API had no corresponding function, but the caller could calculate this from the data obtained from GetFormat. | |
gd_nframes | GetNFrames | Report the size of a dirfile. Behaviour is the same. |
gd_spf | GetSamplesPerFrame | Report the sample rate of a dirfile field. Behaviour is the same. |
gd_putdata | PutData | Store data to a dirfile. Behaviour is the same. No facilities exist in the legacy API to modify scalar fields. |
The following table lists changes made to error codes from the legacy API to the current implementation. The re-implementation of the legacy API uses the new error codes. Other than GD_E_OK, callers should not expect error codes to evaluate to the same literal value as previous GetData releases. Error codes returned only by the new API are not listed here.
New | Legacy | Notes |
---|---|---|
GD_E_OK | Unchanged. This is guaranteed to evaluate to zero. | |
GD_E_OPEN | GD_E_OPEN_FORMAT | Renamed. |
GD_E_FORMAT | Unchanged. | |
GD_E_BAD_CODE | GD_E_BAD_CODE | Combined. |
PD_E_BAD_CODE | ||
GD_E_BAD_TYPE | GD_E_BAD_RETURN_TYPE | Renamed. |
GD_E_RAW_IO | GD_E_OPEN_RAWFIELD | Combined and renamed. |
PD_E_OPEN_RAWFIELD | ||
GD_E_OPEN_FRAGMENT | GD_E_OPEN_INCLUDE | Renamed. Old name remains as an alias. |
GD_E_INTERNAL_ERROR | Unchanged. | |
GD_E_ALLOC | ||
GD_E_RANGE | — | New. |
GD_E_OPEN_LINFILE | Unchanged. | |
GD_E_RECURSE_LEVEL | ||
GD_E_BAD_DIRFILE | — | New. |
GD_E_BAD_FIELD_TYPE | PD_E_MULT_LINCOM | Renamed. |
GD_E_ACCMODE | — | New. |
GD_E_UNSUPPORTED | ||
GD_E_UNKNOWN_ENCODING | ||
GD_E_DIMENSION | ||
GD_E_BAD_SCALAR | ||
GD_E_BAD_REFERENCE | ||
GD_E_PROTECTED | ||
GD_E_DOMAIN | ||
GD_E_BAD_REPR | ||
— | GD_E_FIELD | Removed. (No longer applicable.) |
GD_E_NO_RAW_FIELDS | ||
GD_E_SIZE_MISMATCH | ||
ENDIAN_ERROR | ||
CLOSE_ERROR | ||
— | PD_E_CLOSE_RDONLY | Removed. (Never used.) |
PD_E_WRITE_LOCK | ||
PD_E_FLOCK_ALLOC |
API Translation Example
The following example programs demonstrate how to convert from the legacy to the new API. Since GetData still implements the legacy API, both these programs will run and produce identical results.
/* Legacy API */ #include <getdata.h> #include <stdlib.h> #include <stdio.h> int main(void) {
/* dirfile name */ }
const char *dirfile_name = "/var/dirfile"; /* field code */ const char *field_name = "datafield"; char error_buffer[1024]; int error_code; /* not needed in the new API */ int first_frame = 1000; /* Get size of the database -- third argument is ignored */ int nf = GetNFrames(dirfile_name, &error_code, NULL); if (error_code) {
printf("GetData error: %s\n", GetDataErrorString(error_buffer, 1024)); }exit(1); /* Get samples-per-frame */ int spf = GetSamplesPerFrame(dirfile_name, field_name, &error_code); if (error_code) {
printf("GetData error: %s\n", GetDataErrorString(error_buffer, 1024)); }exit(1); /* Allocate a buffer */ double *data_buffer = malloc(sizeof(double) * spf * (nf - first_frame)); /* Retrieve all but the first 1000 frames */ int n_read = GetData(dirfile_name, field_name, first_frame, 0, nf - first_frame, 0, 'd', data_buffer, &error_code); if (error_code) {
printf("GetData error: %s\n", GetDataErrorString(error_buffer, 1024)); }exit(1); /* Clean up */ free(data_buffer); return 0; |
/* New API -- same header file */ #include <getdata.h> #include <stdlib.h> #include <stdio.h> int main(void) {
/* dirfile name */
}
const char *dirfile_name = "/var/dirfile"; /* field code */ const char *field_name = "datafield"; char error_buffer[1024]; /* off_t is for dirfile offsets and lengths */ off_t first_frame = 1000; /* Open the dirfile */ DIRFILE *dirfile = gd_open(dirfile_name, GD_RDONLY); if (gd_error(dirfile)) {
/* gd_open() returns a pointer to a newly allocated DIRFILE object even if the open failed. This DIRFILE object should still be freed by calling gd_close() after checking the error state */ }printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024)); gd_close(dirfile); exit(1); /* Get size of the database */ off_t nf = gd_nframes(dirfile); /* again off_t */ if (gd_error(dirfile)) {
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024)); }exit(1); /* Get samples-per-frame */ unsigned int spf = gd_spf(dirfile, field_name); if (gd_error(dirfile)) {
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024)); }exit(1); /* Allocate a buffer */ double *data_buffer = malloc(sizeof(double) * spf * (nf - first_frame)); /* Retrieve all but the first 1000 frames -- size_t is for counts of objects read */ size_t n_read = gd_getdata(dirfile, field_name, first_frame, 0, nf - first_frame, 0, GD_FLOAT64, data_buffer); if (gd_error(dirfile)) {
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024)); }exit(1); /* Clean up */ free(data_buffer); gd_close(dirfile); return 0; |