Avoiding Redundancy with Function Pointers

I am currently writing OpenGit, a BSD-licensed re-implementation of Linus Torvald’s Git (lower-cased going forward). This frequently involves reviewing git’s source code to understand how it works under the hood. One of the things I consistently encounter is git performing similar and sizable tasks in multiple different ways and places, resulting in redundancy and a higher maintenance cost.

In this brief entry, I will discuss a classic problem and how I solve it: When minor variants of a routine result in multiple implementations.

Example Pseudo-Code Problem

Git makes heavy use of zlib deflation, a library used to decompress arbitrary data. In the process, git will perform different subroutines, such as calculating an object’s cyclic redundancy check (CRC) or SHA1 value, both on deflated and inflated data, consuming the data in different ways. Rather than having a single deflation function, git re-implemented the deflation code in numerous different ways.

To help understand the problem, lets use this sample pseudo-code, which are decompression routines. The primary difference between the two functions is that one executes additional_routine_one() on the decompressed data, while the other executes additional_routine_two() on the uncompressed data.

void
decompress_routine_one(int fd, char *data, uint8_t *bin)
{
   int size;
   char compressed_data[1000];
   char *decompressed_data;

   do {
      size = read(fd, compressed_data, 1000);

      ... decompress the data ...

      additional_routine_one(decompressed_data, bin);
   } while(size <= 0);
}

void
decompress_routine_two(int fd, char *var)
{
   int size;
   char compressed_data[1000];
   char *decompressed_data;

   do {
      size = read(fd, compressed_data, 1000);
      additional_routine_two(compressed_data, var);

      ... decompress the data ...

   } while(size <= 0);
}

decompress_routine_one(fd, data, bin);
decompress_routine_two(fd, data, var);

In this example, both will decompress the data in the do-while loop, but perform different tasks and require different arguments for the routine.

  • The first executes additional_routine_one(), which uses two arguments: The decompressed_data and uses the bin variable.
  • The second executes additional_routine_two(), which utilizes the raw compressed data, and the variables var and size.

In other words, not only does the additional task change, the location of the task changes. This could be further complicated by the number of arguments that the additional routine utilizes.

Possible Solution

The best approach I have concluded with is to implement handler function pointers at the appropriate points in the primary routine. The application should verify if the handler is not necessary by checking if the pointer is set to NULL. This is, of course, not my own creation, but based on reviewing Linux and BSD (oh, and Windows) kernel implementations. Consider the following alternative.

typedef void datahandler(char *, int, void *);

void
additional_routine_one(char *data, int size, void *arg)
{
   char *x = arg;
   ... do something ...
}

void
additional_routine_two(char *data, int size, void *arg)
{
   int *x = arg;
   ... do something ...
}

void
decompress_routine(int fd, datahandler decompressed_handler, void *darg,
    datahandler compressed_handler, void *iarg)
{
   int size;
   char buf[1000];

   do {
      size = read(fd, buf, 1000);
      if (decompressed_handler)
         decompressed_handler(darg);

      ... decompress the data ...

      if (compressed_handler)
         compressed_handler(iarg);
   } while(size <= 0);
}

decompress_routine(fd, additional_routine_one, bin, NULL, NULL);
decompress_routine(fd, NULL, NULL, additional_routine_two, var);

In this example, the decompress_routine performs the same complex decompression algorithm, but rather than having two separate functions, at the appropriate points in each function they verify if a function pointer was passed, If so, it passes the respective argument.

Additionally, if the program must run both additional_routine_one and additional_routine_two the program can run:

decompress_routine(fd, additional_routine_one, bin, additional_routine_two, var);

Finally, if in cases where I require more than one argument, I typically pass a pointer to a structure with the data I need.

Potential Tradeoffs

  • Performance: There is a trivial cost to verifying if the function pointer is set to NULL or not. This may be irrelevant for general-purpose applications, but could be something to consider for extremely high-performing systems, where memory or disk utilization is not a concern.
  • Code clarity: Having a series of NULLs is “ugly” code. However, this can be trivially resolved by using macros to hide away the NULL references.

Thoughts? Comments? Threats?

Leave a Reply

Your email address will not be published. Required fields are marked *