Libcmpio Reference Manual

for Libcmpio 0.1.0

Table of contents
Overview - What is Libcmpio
Compiling - How to compile and install Libcmpio
API - The application program interface
Basic concepts
Types
Macros
Error reporting
Functions
Utilities
cmpio-rebuild


Overview -  What is Libcmpio

Libcmpio is a general purpose library, which provides an easy way to manipulate disc files with variable record length, data  record compression, transparent volume creation and control and data portability among different platforms . It works on many Unix-like platforms. The source code is released under the GNU General Public License Version 2. Code embedded in Libcmpio from other open source projects like zlib and libbzip2 is distributed under their respective licenses.

Libcmpio is developed with source code and data portability in mind. Its thread safety has not been tested yet, but it will be a thread safe library in the near future.

IMPORTANT NOTICE:  This software is still a work in progress. Serious undiscovered bugs may appear and cause loss of data.  

Compiling - How to compile and install Libcmpio

Libcmpio does not have a configure script. The 'make' command will return all the available options Makefile supports:

$ make
Available make options:
c89
c89-static
cc
cc-static
gcc-debug
gcc
gcc-static
icc
icc-static

Aliases:
generic for cc
generic-static for cc-static
debug for gcc-debug

===> Please type 'make check' to run the test suite after compilation

You may pass arguments to the compiler using the EXTRADEFS environment variable
Example:
EXTRADEFS='-shared -fPIC' make generic

Install libcmpio using the following command:
PREFIX='/usr' make install


Every make option begins with a compiler name. This is the compiler the Makefile will use to compile the source. Make options with the -static suffix will result a static library build.  Options can be passed to compilers though the EXTRADEFS environment variable.

After a successful build, a 'make check' command is needed to verify the  ability of the library to function the right way. If any of the checks fails, the build is unreliable and should not be used in production environments.

The 'make install' command will install the library to the directory defined by the PREFIX environment variable.

NOTICE: Only GNU Make 3.79 or later can understand the Makefiles of Libcmpio.

API - The application program interface

Basic concepts
CmpioFile: An aggregation of records. Usually stored on a hard disc drive. The record length may be fixed or variable. The length of each record may change in the future without effects to its logical position in the CmpioFile. Transparent record compression and decompression is supported. The CmpioFile itself (not the data the records contain) is portable among different platforms despite their endianess differences or the maximum disc file size supported by the file system. The maximum number of records a CmpioFile can handle is determined by cmpio_t  (up to 4 billions).

Data Volume: A disc file, part of a CmpioFile. Created by Libcmpio. Its maximum size in bytes is determined by the user during the creation of a CmpioFile with an upper limit imposed by cmp_off_t (usually 2 gigabytes). A CmpioFile may contain up to 32767 volumes. The creation and the management of the data volumes is invisible to the user.

Types
Types defined in Libcmpio header files.

Synopsis
#include   <cmpio.h>

typedef        cmp_off_t;
typedef        cmpio_t;
typedef        CmpioFile;


Description
Libcmpio defines a number of types in order to ensure source code and data portability among different platforms.

Details
cmp_off_t

typedef  int   cmp_off_t;

Variables of this type contain offsets of  data volumes. No actual use for the application programmer.

cmpio_t

typedef   unsigned int   cmpio_t

Variables of this type are used by Libcmpio functions instead of size_t. They usually contain the number of bytes read or written, or a record number.

CmpioFile

typedef   _CmpioFile   CmpioFile;

The CmpioFile struct is an opaque data structure, which represents a CmpioFile. It should only be accessed through the cmpio_* functions.

Macros
Macros defined in Libcmpio header files

Synopsis
#include   <cmpio.h>

#define   CMP_OFF_T_MAX
#define   CMPIO_T_MAX      

#define   CMP_COMPRESS_NO    
#define   CMP_COMPRESS_ZLIB  
#define   CMP_COMPRESS_BZLIB   
#define   CMP_COMPRESS_LZO   

#define   FALSE
#define   TRUE

#define   CMP_DEFAULT    
 
Description
Commonly used macros in Libcmpio based programs.
Details
CMP_OFF_T_MAX

#define   CMP_OFF_T_MAX        INT_MAX

The maximum value which can be held in a cmp_off_t.

CMPIO_T_MAX

#define   CMPIO_T_MAX            UINT_MAX

The maximum value which can be held in a cmpio_t.

CMP_COMPRESS_NO

#define    CMP_COMPRESS_NO     0

Implies uncompressed record data. Used as an argument passed to cmpio_open2 ().

CMP_COMPRESS_ZLIB

#define   CMP_COMPRESS_ZLIB    1

Implies record data compressed by zlib. Used as an argument passed to cmpio_open2 ().

CMP_COMPRESS_BZLIB

#define   CMP_COMPRESS_BZLIB   2

Implies record data compressed by libbzip2. Used as an argument passed to cmpio_open2 ().

CMP_COMPRESS_LZO

#define  CMP_COMPRESS_LZO       3

Implies record data compressed by minilzo. Used as an argument passed to cmpio_open2 ().

FALSE

#define    FALSE               (0)

Defines the FALSE value used by Libcmpio.

TRUE

#define    TRUE                (!FALSE)

Defines the TRUE value used by Libcmpio.

CMP_DEFAULT

#define CMP_DEFAULT            0

Implies default values for arguments passed to cmpio_open2 ().

Error reporting
Error Reporting — a system for reporting errors

Synopsis
#include  <cmpio_errors.h>

#define   CMP_ERR_SEEK       
#define   CMP_ERR_TELL       
#define   CMP_ERR_VOLFULL       
#define   CMP_ERR_WRITE       
#define   CMP_ERR_MALLOC       
#define   CMP_ERR_READ       
#define   CMP_ERR_BLOCKFULL  
#define   CMP_ERR_LONGPATH   
#define   CMP_ERR_VOLEXIST   
#define   CMP_ERR_NEWVOL       
#define   CMP_ERR_STAT       
#define   CMP_ERR_EOF       
#define   CMP_ERR_NOVOL       
#define   CMP_ERR_NOREC       
#define   CMP_ERR_UNDEFINED   
#define   CMP_ERR_LOCK       
#define   CMP_ERR_COMPRESS   
#define   CMP_ERR_INVARG       
#define   CMP_ERR_INVVOL       
#define   CMP_ERR_INVVER       
#define   CMP_ERR_NOERROR

#define   CMP_ERR_IOERROR      

int      cmp_errno;

char*  cmpio_strerror (int cmp_errnum);

Description
Libcmpio provides a standard method of reporting errors from a called function to the calling code. Every cmpio_* function that accepts CmpioFile as one of its arguments, or returns CmpioFile, sets the cmp_errno variable. If there is no error to report,  cmp_errno is set to CMP_ERR_NOERROR. Otherwise,  cmp_errno holds one of the following error codes:

CMP_ERR_SEEK:   
CMP_ERR_TELL:
CMP_ERR_VOLFULL:
CMP_ERR_WRITE:   
CMP_ERR_MALLOC: 
CMP_ERR_READ: 
CMP_ERR_BLOCKFULL:
CMP_ERR_LONGPATH:
CMP_ERR_VOLEXIST:
CMP_ERR_NEWVOL:  
CMP_ERR_STAT:  
CMP_ERR_EOF :
CMP_ERR_NOVOL: 
CMP_ERR_NOREC: 
CMP_ERR_UNDEFINED:
CMP_ERR_LOCK:  
CMP_ERR_COMPRESS:  
CMP_ERR_INVARG: 
CMP_ERR_INVVOL:   
CMP_ERR_INVVER:    
CMP_ERR_NOERROR: 
Failed to seek.
Failed to determine to current position of the file pointer.
The volume has reached its maximum size.
Failed to write the data.
Failed to allocate memory from the heap.
Failed to read the data.      
The block has reached the maximum number of records.
CmpioFile path too long.  
The volume already exists.
Volume creation error.
Cannot obtain information about the file.
End of file reached. 
The volume does not exist.
The record does not exits. 
Undefined error.   
The record is locked.     
Error during data compression/decompression.
Invalid argument in cmpio_open2 ().     
File is not a CmpioFile.
This version of Cmpio data format is not supported by Libcmpio.  
No error to report. Operation completed successfully.


CMP_ERR_IOERROR is a value returned by some cmpio_* functions to indicate that cmp_errno has been set to a value different than CMP_ERR_NOERROR.

Details
cmp_errno

int      cmp_errno;

Holds the error code of the last called cmpio_* function.

cmpio_strerror ()

char*  cmpio_strerror  (int cmp_errnum);

Returns a string describing the error code passed to the argument cmp_errnum.

Functions
The functions needed to create and use a CmpioFile from within a C program.
Synopsis
#include  <cmpio_errors.h>
#include  <cmpio.h>



CmpioFilecmpio_open (const char *path);
CmpioFilecmpio_open2 (const char *path,
  cmp_off_t volume_size,
  unsigned char compress,  
  int record_boundary,  
  int block_boundary,    
  int block_header_size,
  int olist_size);  
cmpio_t  cmpio_append
(CmpioFile *cfp,
  void *data,
  cmpio_t  size);
cmpio_t  cmpio_read
(CmpioFile *cfp,
  void *data,
  cmpio_t  recno);
int  cmpio_start
(CmpioFile *cfp,
  cmpio_t  recno);
cmpio_t  cmpio_readnext (CmpioFile *cfp,
 void  *data);
cmpio_t  cmpio_totalrecs (CmpioFile *cfp);
cmpio_t  cmpio_overwrite (CmpioFile *cfp,
  void  *data,
  cmpio_t  size,
  cmpio_t  recno);
int  cmpio_lockrec
(CmpioFile *cfp,
  cmpio_t  recno);
int  cmpio_unlockrec
(CmpioFile *cfp,
  cmpio_t  recno);
cmpio_t  cmpio_close (CmpioFile *cfp);
unsigned int  cmpio_libversion
(void);
unsigned int  cmpio_dataversion
(void);

Description
This section describes a number of functions for creating,  reading from or witting to a CmpioFile.

Details
cmpio_open ()

CmpioFile*
cmpio_open
(const char *path);

Creates a new or opens an existing CmpioFile. The file has the same permissions as if it was opened/created by fopen(3).
This function is a wrapper to the cmpio_open2() function. It passes CMP_DEFAULT to all cmpio_open2() arguments, except the path argument.

path:  
The full path of the CmpioFile.
Returns: 
CmpioFile or NULL on failure.

cmpio_open2 ()

CmpioFile* cmpio_open2 (const char *path,
  cmp_off_t volume_size,
  unsigned char compress,  
  int record_boundary,  
  int block_boundary,    
  int block_header_size,
  int olist_size); 

Creates a new or opens an existing CmpioFile. The file has the same permissions as if it was opened/created by fopen(3).  When a CmpioFile is created the values of volume_size, compress, record_boundary, block_boundary, block_header_size and olist_size arguments are saved in the master volume header. The stored values determine the internal structure of the CmpioFile and will be used in the future calls of cmpio_open2 (), bypassing the values of the corresponding arguments. Changes in the internal structure of the CmpioFile can be achieved through the cmpio-rebuild utility.

path:
The full path of the CmpioFile
volume_size:
The maximum size of each volume. Minimum value is 368640. Maximum value is 2147479552. Must be a multiple of 1024. CMP_DEFAULT implies the maximum value.
compress:
The compression algorithm to be used. Valid values are CMP_COMPRESS_NO, CMP_COMPRESS_ZLIB, CMP_COMPRESS_BZLIB and CMP_COMPRESS_LZO. CMP_DEFAULT implies CMP_COMPRESS_NO.
record_boundary:
The offset boundary each record is aligned on. Minimum is 64, maximum is 8192. Must be a multiple of 64. CMP_DEFAULT implies 128.
block_boundary:
The offset boundary each block of records is aligned on. Minimum is 512, maximum is 16385. Must be a multiple of 512. CMP_DEFAULT implies 4096.
block_header_size:
The size of the header of each block of records. Minimum is 1024, maximum is 65536. Must be a multiple of 1024. CMP_DEFAULT implies 8192.
olist_size:
The size of the list that keeps the orphan records. Minimum is 1024, maximum is 1048576. Must be a multiple of 1024. CMP_DEFAULT implies 4096.
Returns:
CmpioFile or NULL on failure.

cmpio_append ()

cmpio_t
cmpio_append
(CmpioFile *cfp,
 void *data,
 cmpio_t size);

Appends a new record to a  CmpioFile.

cfp:
a CmpioFile
data:
the buffer containing the record to append.
size:
the size of the buffer.
Returns:
The record number of the appended record, or CMP_ERR_IOERROR if the operation failed.

cmpio_read ()

cmpio_t
cmpio_read
(CmpioFile *cfp,
  void *data,
  cmpio_t recno);

Reads a record from a CmpioFile.

cfp:
a CmpioFile
data:
a buffer to read the record into.
recno:
the number of the record.
Returns:
the number of bytes read, or CMP_ERR_IOERROR if the operation failed.

cmpio_start ()
 
int
cmpio_start
(CmpioFile *cfp,
  cmpio_t recno);

Initiates fast sequential access with cmpio_readnext ().

cfp:
a CmpioFile
recno:
the number of the record the sequential access will start.
Returns:
TRUE on success or an error code if the operation failed.

cmpio_readnext ()

cmpio_t cmpio_readnext
(CmpioFile *cfp,
 void *data);

Reads the next record from the CmpioFile.

cfp:
a CmpioFile
data:
a buffer to read the record into.
Returns:
the number of bytes read,  or CMP_ERR_IOERROR if the operation failed.

Example:
#include <stdio.h>
#include <cmpio.h>
#include <cmpio_errors.h>

int
main (int argc, char *argv[])
{
  CmpioFile *cfp;
  cmpio_t read_rslt;
  int start_rslt;
  char buffer[16385];

  /* Open the CmpioFile */
  cfp = cmpio_open ("testfile.dat");

  /* Start reading from record 80 */
  start_rslt = cmpio_start (cfp, 80);
  if (start_rslt != TRUE)
    {
      fprintf (stderr, "%s\n", cmpio_strerror (cmp_errno));
      cmpio_close (cfp);
      return 0;
    }
 
  /* Read the next record */
  read_rslt = cmpio_readnext (cfp, buffer);
  while (read_rslt != CMP_ERR_IOERROR)
    {
      /* Add your code here */
      read_rslt = cmpio_readnext (cfp, buffer);
    }
 
  /* If the end-of-file reached the error code will be CMP_ERR_EOF */
  if (cmp_errno != CMP_ERR_EOF)
    fprintf (stderr, "%s\n", cmpio_strerror (cmp_errno));

  cmpio_close (cfp);
  return 0;
}


cmpio_totalrecs ()

cmpio_t cmpio_totalrecs
(CmpioFile *cfp);

Returns the total number of records of the CmpioFile.

cfp: a CmpioFile
Returns:
the total number of records in the CmpioFile,  or CMP_ERR_IOERROR if the operation failed

cmpio_overwrite ()

cmpio_t cmpio_overwrite
(CmpioFile *cfp,
  void  *data,
  cmpio_t  size,
  cmpio_t  recno);

Overwrites an existing record of the CmpioFile. If the size of the new data  exceeds the size of the old data, the record will be overwritten and will maintain its logical position (record number) in the CmpioFile. The record follows the overwritten will not be overlapped.

cfp:
a CmpioFile
data:
the buffer containing the new data of the record.
size:
the size of the buffer.
recno:
the number of the record to be overwritten
Returns:
the bytes written,  or CMP_ERR_IOERROR if the operation failed

cmpio_lockrec ()

int
cmpio_lockrec
(CmpioFile *cfp,
   cmpio_t  recno);

Locks a record of the CmpioFile.

cfp:
a CmpioFile
recno:
the number of the record to be locked
Returns:
TRUE on success, CMP_ERR_LOCK if the record is  already locked, or an error code if the operation failed.

cmpio_unlockrec ()

int cmpio_unlockrec (CmpioFile *cfp,
   cmpio_t  recno);

Unlocks a locked record of the CmpioFile.

cfp:
a CmpioFile
recno:
the number of the record to be unlocked
Returns:
TRUE on success,  or an error code if the operation failed.

cmpio_close ()

cmpio_t cmpio_close (CmpioFile *cfp);

Closes a CmpioFile.

cfp:
a CmpioFile
Returns:
TRUE on success,  or CMP_ERR_IOERROR if the operation failed

cmpio_libversion ()

unsigned int cmpio_libversion
(void);

Returns the version of Libcmpio.

Returns:
the result of the formula VERSION MAJOR * 100 +  VERSION MINOR * 10 + PATCH LEVEL * 1.

cmpio_dataversion ()

unsigned int cmpio_dataversion
(void);

Returns the version of the supported Cmpio data format

Returns:
the result of the formula VERSION MAJOR * 100 +  VERSION MINOR * 10 + PATCH LEVEL * 1.


Utilities

cmpio-rebuild
The cmpio-rebuild utility displays information or changes the internal structure of a CmpioFile. When invoked with no arguments, it displays a help screen.

$ cmpio-rebuild
Usage: cmpio-rebuild [OPTIONS]...[CmpioFile]
OPTIONS:
-h                  Help and version information
-i                   [CmpioFile] information
-r                  Rebuild [CmpioFile]
-c ALGORITHM         Compress with ALGORITHM: zlib, bzlib, lzo or none. DEFAULT none
-v SIZE              Use SIZE as volume size: MIN 368640, MAX 2147479552, DEFAULT 2147479552
-o FILE              Rebuild output to FILE


Details on arguments:
-h
Displays help and version information.
-i [CmpioFile]
Displays information about [CmpioFile]
-r [CmpioFile]
Rebuilds the [CmpioFile] using the defaults described in cmpio_open2(). If there is no -o argument, the [CmpioFile] will be renamed before rebuild.
-c ALGORITHM Compress with ALGORITHM: zlib, bzlib, lzo or none. DEFAULT none. zlib is a good option for records with size less than 100k. bzlib is good with big records but slow. lzo is faster than bzlib with big records, but less effective.
-v SIZE
Uses SIZE as volume size: MIN 368640, MAX 2147479552, DEFAULT 2147479552. SIZE must always be a multiple of 1024.
-o FILE
Writes rebuild output to FILE. If FILE already exists, it will be deleted.

cmpio-rebuild will not retain the endianess of the original CmpioFile.