Automatic Array Limit Checks in C -- Ekstatische Lyriken Pinnwand

Everyone insists that C cannot have automatic array limit checking without a massive performance penalty. I disagree.

I'm assuming that the way it would be implemented is with a new pointer type, which I'll describe here, and then provide some example code that implements something similar in order to see just what the performance impact would be.

To understand how my new "safe" pointers might work, consider how C's existing "complex" float type works. It is in reality a structure of two floats, but the compiler allows it to be used as if it is a simple variable. So you can add them together, multiply them, call functions with them, etc., and it just generates the correct code for those operations, e.g. in the case of multiplying two complex floats, it generates four multiplication operations, an addition and a subtraction, then puts the results into the correct places. It also knows how to allow complex values to interact with ordinary floats, by considering the ordinary floats to be complex floats with a zero imaginary part. The result is seamless integration of something new into the existing language in a way that is very easy to use.

In the same way, I'd like to see array limit checking integrated as just a new type of pointer:

safe char array[100];

When the compiler stumbles upon this code, it would actually create a char pointer and a size_t, placing "100" into the size_t automatically.

Alternately, you could allocate an array like this:

safe char *array; array = malloc(100);

Much like a complex float returned from a function actually returns a structure of two floats, the return from this malloc() would be a structure of a char pointer and a size_t that is equal to the size of the memory allocated. In the event that malloc() were used to assign to an ordinary pointer without size data, the compiler would know how to discard the size data and copy only the pointer data automatically, making a cast unnecessary. This is good as it makes it compatible with existing code.

Then you could use the array like this:

safe char array[10] = "Cool stuff!"; printf("String: %s\n", array);

This would pass the array pointer and its size to printf() which could then be certain not to read past the end of the array, useful in the case that you failed to put the null character at the end of the string. Note that this would require a format identifier other than 's' since 's' is already designed to accept the unsafe pointers that don't contain a limit, but as it would have a new identifier, there would be no need for a new printf(), as printf() already accepts any data type as an argument.

You could also call old functions that expect old unsafe pointers like this:

void old_stuff(char *); safe char array[100]; old_stuff(array);

Here, the compiler would see that the old_stuff() wants a "char *" while our array is a "safe char *", and it would simply turn our "safe char *" into a "char *" by discarding the size data and passing along only the pointer. This would allow people to immediately use this new data type even before old libraries are updated to use it.

As for compatibility in the other direction, that would require an explicit cast, to insert the size data either from a constant or from the variable it is being stored in:

void new_stuff(safe char *); char array[100]; int size; size = fread(array, 1, 100, file); new_stuff( (safe char *) {array, size} );

...or that might also be implemented as a compile-time function...

void new_stuff(safe char *); char array[100]; int size; size = fread(array, 1, 100, file); new_stuff( safe_char(array, size) );

I've mentioned this idea before on Slashdot and probably Reddit, but of course, people are always like "that would ruin the performance of C code." Someone even linked me to some C project that would add limit checks to your C programs, which claimed that they caused a 30% slow-down (or something like that) when added to an executable. If it were that severe then I could see people taking issue with the idea, but frankly, if it slows the code down that much, then I think someone is doing something wrong.

To demonstrate, I wrote some code in which I implemented a crude form of what I've described above. I created a structure that contains both a pointer and a size. I then created a function to allocate memory for this new pointer type, a function to read from it, and a function to write to it. Since, if it were implemented as I've described above, assignment would be done with the usual '=' operator rather than with functions, I made the functions inline so that the compiler will in-line the limit checking and the assignment, and thus be able to optimize those operations.

This is the code I ended up with:

#include <stdlib.h>
#include <stdio.h>

typedef struct {
  char *pointer;
  int size;
} safe_char_pointer;

static safe_char_pointer safe_malloc (int size) {
  safe_char_pointer t;
  t.pointer = malloc(size);
  t.size = size;
  return t;
};

static inline void safe_write(safe_char_pointer x, int i, char c) {
  if (i < 0 || i >= x.size) {
    fprintf(stderr, "Array limits exceeded!\n");
    exit(1);
  };
  x.pointer[i] = c;
};

static inline char safe_read(safe_char_pointer x, int i) {
  if (i < 0 || i >= x.size) {
    fprintf(stderr, "Array limits exceeded!\n");
    exit(1);
  };
  return x.pointer[i];
};

int main () {

  safe_char_pointer a = safe_malloc(10);

  for (int i = 0; i < 10; i++) {
    safe_write(a, i, 65 + i);
  };

  safe_write(a, 9, 0);
  printf("Some shit: %s\n", a.pointer);

  for (int i = 0; i <= 10; i++) {
    int c = safe_read(a, i);
    printf("Byte %d is %d\n", i, c);
  };

};

So just how much does this hurt the performance?

I compiled it like this: gcc -std=gnu99 -g -O9 -o test test.c

This is the assembly code of the first for() loop:

  0040058d  E8AEFFFFFF        call qword 0x400540 This is the call to malloc()
  00400592  BEA4074000        mov esi,0x4007a4 This is the pointer to the format string used for printf()
  00400597  C60041            mov byte [rax],0x41
  0040059a  C6400142          mov byte [rax+0x1],0x42
  0040059e  C6400243          mov byte [rax+0x2],0x43
  004005a2  C6400344          mov byte [rax+0x3],0x44
  004005a6  4889C2            mov rdx,rax
  004005a9  C6400445          mov byte [rax+0x4],0x45
  004005ad  C6400546          mov byte [rax+0x5],0x46
  004005b1  4889C5            mov rbp,rax
  004005b4  C6400647          mov byte [rax+0x6],0x47
  004005b8  C6400748          mov byte [rax+0x7],0x48
  004005bc  BF01000000        mov edi,0x1
  004005c1  C6400849          mov byte [rax+0x8],0x49
  004005c5  C6400900          mov byte [rax+0x9],0x0
  004005c9  31C0              xor eax,eax
  004005cb  E880FFFFFF        call qword 0x400550 This is the call to printf()

In this loop, the limit checks had no effect on the performance at all, as the compiler recognized that the limit was never going to be violated and thus never included the limit checks. It also un-rolled the for() loop and reduced the iterations on the loop to only 9, since the last last position in the array is overwritten after the loop. Clearly this limit-checking isn't slowing down the compiler in this case, as this is exactly the same code it would have created without limit checks.

So what about the second loop where the limit is violated?

  004005d0  0FBE4C1D00        movsx ecx,byte [rbp+rbx+0x0]
  004005d5  89DA              mov edx,ebx
  004005d7  31C0              xor eax,eax
  004005d9  BECA074000        mov esi,0x4007ca This is the address of the format string for printf().
  004005de  BF01000000        mov edi,0x1
  004005e3  4883C301          add rbx,byte +0x1
  004005e7  E864FFFFFF        call qword 0x400550 This is the call to printf()
  004005ec  4883FB0A          cmp rbx,byte +0xa Compare variable 'i' to 10
  004005f0  75DE              jnz 0x4005d0 main+0x50 If it isn't ten, loop back to the beginning.
  004005f2  488B0D5F0A2000    mov rcx,[rel 0x601058] __TMC_END__
  004005f9  BFB3074000        mov edi,0x4007b3 This is the string that contains the error message.
  004005fe  BA16000000        mov edx,0x16
  00400603  BE01000000        mov esi,0x1
  00400608  E863FFFFFF        call qword 0x400570 This is the call to fprintf() to write the error message.
  0040060d  BF01000000        mov edi,0x1
  00400612  E849FFFFFF        call qword 0x400560 This is the call to exit().

Here the compiler recognizes that every access except the last one is valid. So it reduces the for() loop to only 10 iterations, then in place of the 11th iteration, it instead places a copy of the code that generates the error message and calls exit().

The result is that, in every case where the limit was not violated, this limit checking added no new code to the program. Indeed, where the limit was violated, the compiler removed the code for that operation from the program, meaning that using the limit checks actually sped up the execution of the program by some insignificant amount. That's a fine result there.

Now obviously in more complex programs the compiler is going to occasionally actually have to check the limit. An example is when a function is called in a function in an external object file. So what happens if we do that?

I created a multi-file example, but rather than put multiple files of code in this post, I'll just make a download link for it. The above example is in there as well.

The new code is basically the same as the above example, just that I put the safe_malloc() call in one file, then the safe_char_pointer is passed as a function argument to a function in another file which contains the for() loops and the printf() calls, then each file is compiled separately to ensure that GCC generates a function that works with an array of any length since it doesn't know what the array length will be when it compiles the function.

The new function isn't so awesome. Basically, the compiler chose to unroll the loop, and each iteration looks roughly like this, just with different values:

  00000026  0F8EB9000000      jng qword 0xe5 if size is insufficient, jump to error routine
  0000002c  83FE03            cmp esi,byte +0x3 test size against what is needed for the next iteration of this loop
  0000002f  C6470243          mov byte [rdi+0x2],0x43 write the value for this iteration of this loop

So it is performing the limit check on every iteration of the loop. That's a bummer, as there is no reason it has to do that.

The exit() involved in the error routine means that, if the limit is violated, what ends up in the array is irrelevant. Thus there is no reason that the array size needs to be checked more than once. ...and the compiler knows that. If I modify the first example so that the first for() loop goes to 11, it creates an executable which has code that first calls malloc(), then calls fprintf(), then calls exit(), with no sign of the for() loop or the rest of the program anywhere.

The problem seems to be that, while the compiler realizes that if the limit is greater than 9 then it is also greater than everything less than 9, it doesn't consider that perhaps it can perform that comparison to 9 before it performs the comparison to the smaller numbers, and in doing so, avoid all of the other comparisons. Indeed, if I comment out the assignment within safe_write(), the generated code is just a long list of compares and conditional jumps which could quite obviously be optimized by eliminating all but the last one. So either this is an obvious optimization that the compiler simply doesn't yet know how to make, or it is some sort of bug in the optimizer causing it to miss this optimization.

Anyway, hoping to force the compiler to make this optimization, before the first for() loop I added a safe_write() call that writes to the tenth element of the array, to "trick" the compiler into performing that comparison first. So the code now looks like this:

void external_function (safe_char_pointer a) {

  safe_write(a, 9, 0);

  for (int i = 0; i < 10; i++) {
    safe_write(a, i, 65 + i);
  };

  safe_write(a, 9, 0);
  printf("Some shit: %s\n", a.pointer);

  for (int i = 0; i <= 10; i++) {
    int c = safe_read(a, i);
    printf("Byte %d is %d\n", i, c);
  };

};

Upon compiling that code, I get this:

  00000032  83FE09            cmp esi,byte +0x9 compare array size to 9
  00000035  55                push rbp
  00000036  53                push rbx
  00000037  0F8EB5000000      jng qword 0xf2 if size is not greater than 9, jump to error routine
  0000003d  4889FD            mov rbp,rdi
  00000040  4889FA            mov rdx,rdi
  00000043  C60741            mov byte [rdi],0x41
  00000046  C6470142          mov byte [rdi+0x1],0x42
  0000004a  C6470243          mov byte [rdi+0x2],0x43
  0000004e  4189F4            mov r12d,esi
  00000051  C6470344          mov byte [rdi+0x3],0x44
  00000055  C6470445          mov byte [rdi+0x4],0x45
  00000059  BE00000000        mov esi,0x0
  0000005e  C6470546          mov byte [rdi+0x5],0x46
  00000062  C6470647          mov byte [rdi+0x6],0x47
  00000066  31C0              xor eax,eax
  00000068  C6470748          mov byte [rdi+0x7],0x48
  0000006c  C6470849          mov byte [rdi+0x8],0x49
  00000070  4883C501          add rbp,byte +0x1
  00000074  C6470900          mov byte [rdi+0x9],0x0
  00000078  BF01000000        mov edi,0x1

That's what I expected it to do to begin with. It put one check at the beginning of the function for the limit, and if it succeeds, it proceeds to the un-rolled for() loop, this time without limit checks. It also skips writing the value from that initial call to safe_write() since it knows it is overwritten by the for() loop. Thus the performance penalty here is one compare and one conditional jump, exactly what you'd get if you were to implement the limit checks manually by passing a size along with every pointer and simply checking the size at the top of the function.

Anyway, that's an optimization that I think the compiler authors could easily implement when adding support for this new pointer type with limit checking, and with that, the performance penalty of limit checks would be almost non-existent.

...but I know people won't think that's good enough. Any performance penalty is too much for some people. The way I look at it, I can buy a faster computer, but I can't buy a more secure one. So even if something like this cut my computer down to half speed, I'd be loving every minute of it as I finally feel confident enough to open the SSH port on my router.

...and besides, in cases where people really don't care about security, they can just disable the limit checks with a compiler flag. Meanwhile, the fact that the limit checks are even an option will mean that some people will use them and so bugs will be discovered and fixed. The result of that will be that software is safer even when compiled without the limit checks.

Hopefully this idea will catch on someday. I've noticed that I sometimes spot people repeating ideas I was talking about five to ten years ago. I don't know if the ideas started with me or if they were independently discovered (certainly no one seemed to like the ideas when I was talking about them, so I assume it's the latter), but either way, it's nice that stuff isn't doomed to never improve. So I expect C will eventually have pointers like this, it just might be five to twenty years before it happens.

Automatic Array Limit Checks in C

Replies

Your Reply