The Legacy API Transition

home | download | API documentation | dirfile standards | mailing list | sourceforge | github

Introduction

Users of GetData before the release of version 0.3 (August 2008) will notice a fairly substantial change in the API starting with version 0.3.0 as compared with older version. Older versions of this library (hereafter referred to as the "legacy API") suffered from thread safety issues, and lacked LFS (large file) support. A new API has been created to address these issues.

The legacy API has been re-implemented in the library and programs that used it should still work without modification after linking to the latest version of GetData, with these two small provisos explained further the comparison section below:

Furthermore, much of the new functionality introduced from version 0.3.0 onwards is not available in the legacy API.

The New API

The new API separates the opening of a Dirfile from reading or writing to it. Where in the old API one would use:

int n_read = GetData("dirfile", "field", 1, 0, 1, 0, 'S', data, &error_code);
the corresponding new API would have:
DIRFILE *D = gd_open("dirfile", GD_RDONLY);
size_t n_read = gd_getdata(D, "field", 1, 0, 1, 0, GD_INT32, data);
Here, D is a pointer to a DIRFILE object. This object is modelled after the FILE object of the C Library's stdio interface.

The new API is fully documented in the included man pages and in the Using GetData document. A translation example from the legacy API to the new API is present as at the end of this document.

The DIRFILE Object

In the legacy API, dirfiles were referred to by their path name. In the new API, after the dirfile has been opened, it is referenced instead by passing a pointer to a DIRFILE object. Further, where the legacy API was passed an integer pointer to store the error code, this is now stored in the DIRFILE object itself and may be accessed at any time by calling gd_error() As a side-effect of this change, the error value itself, and the descriptive error string which can be generated by the library, is now local to a particular instance of a particular dirfile, rather than being global across the library.

Once a DIRFILE object has been created by a call to gd_open, all subsequent operations on the dirfile operate on this object. Once the program is finished with the dirfile, the object can be destroyed, and all open file handles closed, with the call:

gd_close(D);

Data Types

Partially in order to fully support large files (>2 GB) as defined by the LFS, a consistent data type structure is used in the new API:

The legacy API continues to use int for offsets, sizes, and counts, which prevents it from supporting large files.

Largefile Support

When built on a platform using the GNU C Library, or another compatible C Library, the new GetData API will respect the feature test macros _LARGEFILE64_SOURCE and _FILE_OFFSET_BITS affecting largefile (>2 GB) support. If one or the other of these are to be used, they must be defined before including getdata.h or any Standard C Library header file.

The first of these, _LARGEFILE64_SOURCE, if defined before including getdata.h, will enable the obsolete, transitional largefile extensions defined by the LFS. This will enable explicit support for large files by defining the 64-bit explicit type off64_t, and result in GetData defining the explicitly 64-bit interfaces gd_getdata64, gd_putdata64, and gd_nframes64. This macro is largely obsolete, and using _FILE_OFFSET_BITS is preferred, if supported.

The second macro, _FILE_OFFSET_BITS, determines the size of off_t. If not defined, or defined to 32, off_t will be a 32-bit type. If, instead, this macro is defined to 64, off_t will be the largefile supporting 64-bit type, and calls to gd_getdata, gd_putdata, &c. will intrinsically have largefile support. On 64-bit systems this macro has no effect, since a 64-bit off_t is used all the time.

If your system uses the GNU C Library, the feature_test_macros(7) man page will provide further explanation. On systems where these macros are unsupported, the gd_getdata64, &c. interfaces will never be defined, and the size of off_t will be system dependent. In this case, GetData will follow the default largefile behaviour of the underlying platform.

If you build GetData against a C Library that lacks largefile support, the GetData library will not support large files either, no matter what you do with these macros.

API Comparison

The following table lists correspondences between the legacy API and the new API. Legacy API support in this version of GetData is a reimplementation of that API based on the new API. As a result, one function (GetFormat) and several error codes in the legacy API (see below) have changed slightly, but bugs fixed in the internals of the library for the new API will apply to the legacy API as well. Furthermore, the new API has additional functionality not indicated here.

Table 1: Comparison of functions.
NewLegacyNotes
gd_close Closes a dirfile. The legacy API contained no facility to do this.
gd_flush Flushes (i.e. syncs and closes binary files associated with) a dirfile field, or the whole dirfile. The legacy API contained no facility to do this, however, several extended versions of GetData did contain such facilities. TK's libdirfile/b2klib contained:
  • void CloseData(char *dirfilename, char *field_name, int *error_code);
  • void CloseAll(int *error_code);
which behaved similarly (the second of these flushed all open dirfiles). Similarly, JF's GetData had:
  • int GetDataClose(const char *dirfilename, int *error_code);
The re-implementation of the legacy API contains none of these.
gd_open Opens or creates a dirfile. A dirfile open happened implicitly in the legacy API, the first time it was accessed. The legacy API had no facility to create new dirfiles.
gd_getdata GetData Fetch data from a dirfile. Behaviour is the same. No facilities exist in the legacy API to retrieve scalar fields.
gd_error_string GetDataErrorString Returns a descriptive error string. Behaviour is the same.
GD_ERROR_CODES In the legacy API, this was a global array of error messages. The new API supports no such array; callers should use the gd_error_string function instead. (This is good advice for users of the legacy API as well.) Some error codes are specific to the new API. These error codes may not have a corresponding entry in this array.
gd_entry GetFormat Returns the metadata for one field. The legacy API returned a structure containing all the dirfile metadata. The legacy API's re-implementation of this function still returns this structure, but only those members corresponding to public members of the gd_entry_t object will be properly initialised. Furthermore, RAW data types INT8, UINT64, INT64, COMPLEX64, and COMPLEX128 cannot be represented by the legacy API. The legacy API will incorrectly report fields of these types to have the NULL ('n') type. Furthermore, since the legacy API does not support POLYNOM, SBIT, DIVIDE, and RECIP fields, these are listed in the legacy API's structure as LINCOM, BIT, MULTIPLY, and LINCOM fields, respectively. Scalar fields are completely ignored by the legacy API.
gd_field_list List the fields in a dirfile. The structure returned by the legacy API contained lists of the fields in the dirfile, broken up by field type.
gd_nfields Report the number of fields in a dirfile. The legacy API had no corresponding function, but the caller could calculate this from the data obtained from GetFormat.
gd_nframes GetNFrames Report the size of a dirfile. Behaviour is the same.
gd_spf GetSamplesPerFrame Report the sample rate of a dirfile field. Behaviour is the same.
gd_putdata PutData Store data to a dirfile. Behaviour is the same. No facilities exist in the legacy API to modify scalar fields.

The following table lists changes made to error codes from the legacy API to the current implementation. The re-implementation of the legacy API uses the new error codes. Other than GD_E_OK, callers should not expect error codes to evaluate to the same literal value as previous GetData releases. Error codes returned only by the new API are not listed here.

Table 2: Comparison of error codes.
NewLegacyNotes
GD_E_OKUnchanged. This is guaranteed to evaluate to zero.
GD_E_OPEN GD_E_OPEN_FORMATRenamed.
GD_E_FORMAT Unchanged.
GD_E_BAD_CODE GD_E_BAD_CODE Combined.
PD_E_BAD_CODE
GD_E_BAD_TYPE GD_E_BAD_RETURN_TYPERenamed.
GD_E_RAW_IO GD_E_OPEN_RAWFIELD Combined and renamed.
PD_E_OPEN_RAWFIELD
GD_E_OPEN_FRAGMENT GD_E_OPEN_INCLUDERenamed. Old name remains as an alias.
GD_E_INTERNAL_ERROR Unchanged.
GD_E_ALLOC
GD_E_RANGENew.
GD_E_OPEN_LINFILE Unchanged.
GD_E_RECURSE_LEVEL
GD_E_BAD_DIRFILE New.
GD_E_BAD_FIELD_TYPE PD_E_MULT_LINCOMRenamed.
GD_E_ACCMODE New.
GD_E_UNSUPPORTED
GD_E_UNKNOWN_ENCODING
GD_E_DIMENSION
GD_E_BAD_SCALAR
GD_E_BAD_REFERENCE
GD_E_PROTECTED
GD_E_DOMAIN
GD_E_BAD_REPR
GD_E_FIELD Removed. (No longer applicable.)
GD_E_NO_RAW_FIELDS
GD_E_SIZE_MISMATCH
ENDIAN_ERROR
CLOSE_ERROR
PD_E_CLOSE_RDONLY Removed. (Never used.)
PD_E_WRITE_LOCK
PD_E_FLOCK_ALLOC

API Translation Example

The following example programs demonstrate how to convert from the legacy to the new API. Since GetData still implements the legacy API, both these programs will run and produce identical results.

/* Legacy API */                 
#include <getdata.h>
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
/* dirfile name */
const char *dirfile_name = "/var/dirfile";
/* field code */
const char *field_name = "datafield";
char error_buffer[1024];
int error_code; /* not needed in the new API */
int first_frame = 1000;

                      
                                                         
                        

                                                                                                                                                                                                       
                                                                             
                       
        

 

/* Get size of the database -- third argument is ignored */
int nf = GetNFrames(dirfile_name, &error_code, NULL);
if (error_code) {       
printf("GetData error: %s\n", GetDataErrorString(error_buffer, 1024));       
exit(1);
}

/* Get samples-per-frame */
int spf = GetSamplesPerFrame(dirfile_name, field_name, &error_code);
if (error_code) {       
printf("GetData error: %s\n", GetDataErrorString(error_buffer, 1024));       
exit(1);
}

/* Allocate a buffer */
double *data_buffer = malloc(sizeof(double) * spf * (nf - first_frame));

/* Retrieve all but the first 1000 frames */                                        
int n_read = GetData(dirfile_name, field_name, first_frame, 0, nf - first_frame, 0, 'd', data_buffer, &error_code);
if (error_code) {       
printf("GetData error: %s\n", GetDataErrorString(error_buffer, 1024));       
exit(1);
}

/* Clean up */
free(data_buffer);
                       

return 0;
}
/* New API -- same header file */
#include <getdata.h>
#include <stdlib.h>
#include <stdio.h>

int main(void)
{
/* dirfile name */
const char *dirfile_name = "/var/dirfile";
/* field code */
const char *field_name = "datafield";
char error_buffer[1024];
/* off_t is for dirfile offsets and lengths */
off_t first_frame = 1000;

/* Open the dirfile */
DIRFILE *dirfile = gd_open(dirfile_name, GD_RDONLY);
if (gd_error(dirfile)) {
/* gd_open() returns a pointer to a newly allocated DIRFILE object even if the open failed. This DIRFILE object should still be freed by calling gd_close() after checking the error state */
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024));
gd_close(dirfile);
exit(1);
}

/* Get size of the database */                             
off_t nf = gd_nframes(dirfile); /* again off_t */   
if (gd_error(dirfile)) {
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024));
exit(1);
}

/* Get samples-per-frame */
unsigned int spf = gd_spf(dirfile, field_name);                    
if (gd_error(dirfile)) {
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024));
exit(1);
}

/* Allocate a buffer */
double *data_buffer = malloc(sizeof(double) * spf * (nf - first_frame));

/* Retrieve all but the first 1000 frames -- size_t is for counts of objects read */
size_t n_read = gd_getdata(dirfile, field_name, first_frame, 0, nf - first_frame, 0, GD_FLOAT64, data_buffer);        
if (gd_error(dirfile)) {
printf("GetData error: %s\n", gd_error_string(dirfile, error_buffer, 1024));
exit(1);
}

/* Clean up */
free(data_buffer);
gd_close(dirfile);

return 0;
}
© 2008, 2010 D. V. Wiebe
Valid HTML 4.01 StrictValid CSS