Ekstatische Lyriken Pinnwand

Automatic Array Limit Checks in C

written by Pj on Tuesday March 22nd, 2016 -- 5:08 p.m.

edit this message - return to message index
(only moderators may edit messages)
Everyone insists that C cannot have automatic array limit checking without a massive performance penalty.  I disagree.

I'm assuming that the way it would be implemented is with a new pointer type, which I'll describe here, and then provide some example code that implements something similar in order to see just what the performance impact would be.

To understand how my new "safe" pointers might work, consider how C's existing "complex" float type works.  It is in reality a structure of two floats, but the compiler allows it to be used as if it is a simple variable.  So you can add them together, multiply them, call functions with them, etc., and it just generates the correct code for those operations, e.g. in the case of multiplying two complex floats, it generates four multiplication operations, an addition and a subtraction, then puts the results into the correct places.  It also knows how to allow complex values to interact with ordinary floats, by considering the ordinary floats to be complex floats with a zero imaginary part.  The result is seamless integration of something new into the existing language in a way that is very easy to use.

In the same way, I'd like to see array limit checking integrated as just a new type of pointer:

safe char array[100];

When the compiler stumbles upon this code, it would actually create a char pointer and a size_t, placing "100" into the size_t automatically.

Alternately, you could allocate an array like this:

safe char *array;
array = malloc(100);


Much like a complex float returned from a function actually returns a structure of two floats, the return from this malloc() would be a structure of a char pointer and a size_t that is equal to the size of the memory allocated.  In the event that malloc() were used to assign to an ordinary pointer without size data, the compiler would know how to discard the size data and copy only the pointer data automatically, making a cast unnecessary.  This is good as it makes it compatible with existing code.

Then you could use the array like this:

safe char array[10] = "Cool stuff!";
printf("String: %s\n", array);


This would pass the array pointer and its size to printf() which could then be certain not to read past the end of the array, useful in the case that you failed to put the null character at the end of the string.  Note that this would require a format identifier other than 's' since 's' is already designed to accept the unsafe pointers that don't contain a limit, but as it would have a new identifier, there would be no need for a new printf(), as printf() already accepts any data type as an argument.

You could also call old functions that expect old unsafe pointers like this:

void old_stuff(char *);
safe char array[100];
old_stuff(array);


Here, the compiler would see that the old_stuff() wants a "char *" while our array is a "safe char *", and it would simply turn our "safe char *" into a "char *" by discarding the size data and passing along only the pointer.  This would allow people to immediately use this new data type even before old libraries are updated to use it. 

As for compatibility in the other direction, that would require an explicit cast, to insert the size data either from a constant or from the variable it is being stored in:

void new_stuff(safe char *);
char array[100];
int size;
size = fread(array, 1, 100, file);
new_stuff( (safe char *) {array, size} );


...or that might also be implemented as a compile-time function...

void new_stuff(safe char *);
char array[100];
int size;
size = fread(array, 1, 100, file);
new_stuff( safe_char(array, size) );


I've mentioned this idea before on Slashdot and probably Reddit, but of course, people are always like "that would ruin the performance of C code." Someone even linked me to some C project that would add limit checks to your C programs, which claimed that they caused a 30% slow-down (or something like that) when added to an executable.  If it were that severe then I could see people taking issue with the idea, but frankly, if it slows the code down that much, then I think someone is doing something wrong. 

To demonstrate, I wrote some code in which I implemented a crude form of what I've described above.  I created a structure that contains both a pointer and a size.  I then created a function to allocate memory for this new pointer type, a function to read from it, and a function to write to it.  Since, if it were implemented as I've described above, assignment would be done with the usual '=' operator rather than with functions, I made the functions inline so that the compiler will in-line the limit checking and the assignment, and thus be able to optimize those operations.

This is the code I ended up with:

#include <stdlib.h>
#include <stdio.h>

typedef struct {
char *pointer;
int size;
} safe_char_pointer;

static safe_char_pointer safe_malloc (int size) {
safe_char_pointer t;
t.pointer = malloc(size);
t.size = size;
return t;
};

static inline void safe_write(safe_char_pointer x, int i, char c) {
if (i < 0 || i >= x.size) {
fprintf(stderr, "Array limits exceeded!\n");
exit(1);
};
x.pointer[i] = c;
};

static inline char safe_read(safe_char_pointer x, int i) {
if (i < 0 || i >= x.size) {
fprintf(stderr, "Array limits exceeded!\n");
exit(1);
};
return x.pointer[i];
};

int main () {

safe_char_pointer a = safe_malloc(10);

for (int i = 0; i < 10; i++) {
safe_write(a, i, 65 + i);
};

safe_write(a, 9, 0);
printf("Some shit: %s\n", a.pointer);

for (int i = 0; i <= 10; i++) {
int c = safe_read(a, i);
printf("Byte %d is %d\n", i, c);
};

};

So just how much does this hurt the performance?

I compiled it like this: gcc -std=gnu99 -g -O9 -o test test.c

This is the assembly code of the first for() loop:

  0040058d  E8AEFFFFFF        call qword 0x400540 This is the call to malloc()
00400592 BEA4074000 mov esi,0x4007a4 This is the pointer to the format string used for printf()
00400597 C60041 mov byte [rax],0x41
0040059a C6400142 mov byte [rax+0x1],0x42
0040059e C6400243 mov byte [rax+0x2],0x43
004005a2 C6400344 mov byte [rax+0x3],0x44
004005a6 4889C2 mov rdx,rax
004005a9 C6400445 mov byte [rax+0x4],0x45
004005ad C6400546 mov byte [rax+0x5],0x46
004005b1 4889C5 mov rbp,rax
004005b4 C6400647 mov byte [rax+0x6],0x47
004005b8 C6400748 mov byte [rax+0x7],0x48
004005bc BF01000000 mov edi,0x1
004005c1 C6400849 mov byte [rax+0x8],0x49
004005c5 C6400900 mov byte [rax+0x9],0x0
004005c9 31C0 xor eax,eax
004005cb E880FFFFFF call qword 0x400550 This is the call to printf()

In this loop, the limit checks had no effect on the performance at all, as the compiler recognized that the limit was never going to be violated and thus never included the limit checks.  It also un-rolled the for() loop and reduced the iterations on the loop to only 9, since the last last position in the array is overwritten after the loop.  Clearly this limit-checking isn't slowing down the compiler in this case, as this is exactly the same code it would have created without limit checks.

So what about the second loop where the limit is violated?

  004005d0  0FBE4C1D00        movsx ecx,byte [rbp+rbx+0x0]
004005d5 89DA mov edx,ebx
004005d7 31C0 xor eax,eax
004005d9 BECA074000 mov esi,0x4007ca This is the address of the format string for printf().
004005de BF01000000 mov edi,0x1
004005e3 4883C301 add rbx,byte +0x1
004005e7 E864FFFFFF call qword 0x400550 This is the call to printf()
004005ec 4883FB0A cmp rbx,byte +0xa Compare variable 'i' to 10
004005f0 75DE jnz 0x4005d0 main+0x50 If it isn't ten, loop back to the beginning.
004005f2 488B0D5F0A2000 mov rcx,[rel 0x601058] __TMC_END__
004005f9 BFB3074000 mov edi,0x4007b3 This is the string that contains the error message.
004005fe BA16000000 mov edx,0x16
00400603 BE01000000 mov esi,0x1
00400608 E863FFFFFF call qword 0x400570 This is the call to fprintf() to write the error message.
0040060d BF01000000 mov edi,0x1
00400612 E849FFFFFF call qword 0x400560 This is the call to exit().

Here the compiler recognizes that every access except the last one is valid.  So it reduces the for() loop to only 10 iterations, then in place of the 11th iteration, it instead places a copy of the code that generates the error message and calls exit(). 

The result is that, in every case where the limit was not violated, this limit checking added no new code to the program.  Indeed, where the limit was violated, the compiler removed the code for that operation from the program, meaning that using the limit checks actually sped up the execution of the program by some insignificant amount.  That's a fine result there.

Now obviously in more complex programs the compiler is going to occasionally actually have to check the limit.  An example is when a function is called in a function in an external object file.  So what happens if we do that?

I created a multi-file example, but rather than put multiple files of code in this post, I'll just make a download link for it.  The above example is in there as well. 

The new code is basically the same as the above example, just that I put the safe_malloc() call in one file, then the safe_char_pointer is passed as a function argument to a function in another file which contains the for() loops and the printf() calls, then each file is compiled separately to ensure that GCC generates a function that works with an array of any length since it doesn't know what the array length will be when it compiles the function.

The new function isn't so awesome.  Basically, the compiler chose to unroll the loop, and each iteration looks roughly like this, just with different values:

  00000026  0F8EB9000000      jng qword 0xe5 if size is insufficient, jump to error routine
0000002c 83FE03 cmp esi,byte +0x3 test size against what is needed for the next iteration of this loop
0000002f C6470243 mov byte [rdi+0x2],0x43 write the value for this iteration of this loop

So it is performing the limit check on every iteration of the loop.  That's a bummer, as there is no reason it has to do that. 

The exit() involved in the error routine means that, if the limit is violated, what ends up in the array is irrelevant.  Thus there is no reason that the array size needs to be checked more than once.  ...and the compiler knows that.  If I modify the first example so that the first for() loop goes to 11, it creates an executable which has code that first calls malloc(), then calls fprintf(), then calls exit(), with no sign of the for() loop or the rest of the program anywhere. 

The problem seems to be that, while the compiler realizes that if the limit is greater than 9 then it is also greater than everything less than 9, it doesn't consider that perhaps it can perform that comparison to 9 before it performs the comparison to the smaller numbers, and in doing so, avoid all of the other comparisons.  Indeed, if I comment out the assignment within safe_write(), the generated code is just a long list of compares and conditional jumps which could quite obviously be optimized by eliminating all but the last one.  So either this is an obvious optimization that the compiler simply doesn't yet know how to make, or it is some sort of bug in the optimizer causing it to miss this optimization.

Anyway, hoping to force the compiler to make this optimization, before the first for() loop I added a safe_write() call that writes to the tenth element of the array, to "trick" the compiler into performing that comparison first.  So the code now looks like this:

void external_function (safe_char_pointer a) {

safe_write(a, 9, 0);

for (int i = 0; i < 10; i++) {
safe_write(a, i, 65 + i);
};

safe_write(a, 9, 0);
printf("Some shit: %s\n", a.pointer);

for (int i = 0; i <= 10; i++) {
int c = safe_read(a, i);
printf("Byte %d is %d\n", i, c);
};

};

Upon compiling that code, I get this:

  00000032  83FE09            cmp esi,byte +0x9 compare array size to 9
00000035 55 push rbp
00000036 53 push rbx
00000037 0F8EB5000000 jng qword 0xf2 if size is not greater than 9, jump to error routine
0000003d 4889FD mov rbp,rdi
00000040 4889FA mov rdx,rdi
00000043 C60741 mov byte [rdi],0x41
00000046 C6470142 mov byte [rdi+0x1],0x42
0000004a C6470243 mov byte [rdi+0x2],0x43
0000004e 4189F4 mov r12d,esi
00000051 C6470344 mov byte [rdi+0x3],0x44
00000055 C6470445 mov byte [rdi+0x4],0x45
00000059 BE00000000 mov esi,0x0
0000005e C6470546 mov byte [rdi+0x5],0x46
00000062 C6470647 mov byte [rdi+0x6],0x47
00000066 31C0 xor eax,eax
00000068 C6470748 mov byte [rdi+0x7],0x48
0000006c C6470849 mov byte [rdi+0x8],0x49
00000070 4883C501 add rbp,byte +0x1
00000074 C6470900 mov byte [rdi+0x9],0x0
00000078 BF01000000 mov edi,0x1

That's what I expected it to do to begin with.  It put one check at the beginning of the function for the limit, and if it succeeds, it proceeds to the un-rolled for() loop, this time without limit checks.  It also skips writing the value from that initial call to safe_write() since it knows it is overwritten by the for() loop.  Thus the performance penalty here is one compare and one conditional jump, exactly what you'd get if you were to implement the limit checks manually by passing a size along with every pointer and simply checking the size at the top of the function.

Anyway, that's an optimization that I think the compiler authors could easily implement when adding support for this new pointer type with limit checking, and with that, the performance penalty of limit checks would be almost non-existent.

...but I know people won't think that's good enough.  Any performance penalty is too much for some people.  The way I look at it, I can buy a faster computer, but I can't buy a more secure one.  So even if something like this cut my computer down to half speed, I'd be loving every minute of it as I finally feel confident enough to open the SSH port on my router.

...and besides, in cases where people really don't care about security, they can just disable the limit checks with a compiler flag.  Meanwhile, the fact that the limit checks are even an option will mean that some people will use them and so bugs will be discovered and fixed.  The result of that will be that software is safer even when compiled without the limit checks.

Hopefully this idea will catch on someday.  I've noticed that I sometimes spot people repeating ideas I was talking about five to ten years ago.  I don't know if the ideas started with me or if they were independently discovered (certainly no one seemed to like the ideas when I was talking about them, so I assume it's the latter), but either way, it's nice that stuff isn't doomed to never improve.  So I expect C will eventually have pointers like this, it just might be five to twenty years before it happens.

Replies

return to message index

Your Reply

Name: No registration necessary. Simply choose
a name and password and type them in.
Password:
Subject:
You may want to read the rules before you spend a lot of time writing something.
Message:
Plain Text - What you type is what you will see.
Some HTML - Use this if you are including HTML tags.
Pure HTML - Copies your post directly into the web page.
first, then