Name:

Password:

or

Ideas about C Strings and Arrays

by Octapoo

Sunday, June 8, 2025 at 09:01:24 UTC

Return to the Summary in Ecstatic Lyrics Blog

So when I wrote this blog, in C, I also wrote a string library, because not doing so would have been asking for problems. So the other day I was writing something else and typed up this code, using my array of my strings type which I also created for the blog:

if (things.count) { printf("Existing things:\n"); for (int i = 0; i < things.count; i++) { printf(" '%s'\n", things.string[i].buffer); }; };

I was momentarily overwhelmed with how much easier this is than what I would be doing without the library.

I've thought before about releasing my personal library as an actual library, largely because I tried to put some projects on my GitHub page but the way that I manage my personal library on my own computer isn't compatible with uploading code to GitHub. So I only uploaded a couple of things and then stopped until I figure out what to do about that problem.

Anyway, this got me thinking about turning my personal library into a real library again, but, kind of the reason I don't is that it's not really "finished." So this made me think a little about what would make it more "finished" and that gave me an idea I think is worth sharing


The main thing I like about my personal library is the string type, which is defined like this:

struct easy_string { int allocation_size; int length; union { char *buffer; const char *const_buffer; }; };

Overall it isn't much different than the data type used by the already-existing "bstrings" library, but of course I prefer my own implementation. In particular, in reading about bstrings, it sounds like it requires the strings to be initialized by calling a function to allocate them, whereas my string functions are fine with this:

struct easy_string test = {}; easy_string_format(&test, "The number is %d\n", number); printf("The test string is: %s\n", test.buffer); easy_string_free(&test);

Call me lazy, but I'd rather not have to call an initialization function if I don't have to. I'm annoyed enough that I have to call the function to free the string afterwards. Every little thing that makes it harder to use just puts me one step closer to using a mere char * which just denies me the benefits of using the library, like the impossibility of overflowing a buffer. So I think it's an important feature that it doesn't make me have to do anything it can do for me.

After thinking about this for a day, it occurred to me that what I was most enjoying at the moment I started thinking about making a library again wasn't the string type itself, but the array of the string type.

struct easy_string_array { int count; struct easy_string *string; };

This, of course, isn't much of anything special. Perhaps the only unusual thing about it is that it's a structure. More often than not, when declaring a variable-length list, I don't bother with the structure, even though I would say that it's probably always a good idea. Instead I tend to just do this:

char **string = NULL; int string_count = 0;

However, the two variables are so related that, even though it's easier to just do that, I do think they belong in a structure. Indeed, I find one of the things that generally makes my code feel better is making little structures out of things that I typically wouldn't. For example, when having x and y coordinates, I usually declare them separately, but in the few programs where I've done this...

struct int_xy { int x; int y; };

...I'm generally happier with the resulting code. The only thing I don't like about it is when I want to pass a constant to a function that expects this structure, as it's a little annoying to type this:

set_pixel(&image, (struct int_xy) {123, 456});

I wish GCC would allow just {123, 456} without the explicit cast, as I think it's obvious what I want to do there, so I don't know why I have to tell it explicitly what kind of structure to turn it into.

...but, back to the code that made me start thinking about creating a library again:

if (things.count) { printf("Existing things:\n"); for (int i = 0; i < things.count; i++) { printf(" '%s'\n", things.string[i].buffer); }; };

Obviously my strings have a length attached, but what's making them so useful here is that they're wrapped up in an array type that also has the length of the array attached. This got me thinking that perhaps the problem is less about C having poor string support, and more about C having poor array support.

I kind of wrote about this on my old blog in Automatic Array Limit Checks in C but, while waiting for the GCC developers to come up with my idea on their own, in the meantime, I've just been keeping separate pointer and length variables.

In doing so, I've found myself typing code like this a lot:

// insert "number" at position "index" in the array array_count++; easy_memory_allocate(&array_data, array_count * sizeof(*array_data)); memmove(&array_data[index + 1], &array_data[index], (array_count - index - 1) * sizeof(*array_data)); array_data[index] = number;

I've done it so many times that, while the first time, it took a lot of thought to get the pointer arithmetic right, I don't even feel all that compelled to compile it and make sure it works before posting it online and making myself look dumb when it doesn't work.

Anyway, this code, from the perspective of someone used to a higher-level language where they don't have to do stuff like this, probably looks insane. In other languages this is as trivial as array_insert(&array, index, number) or something like that.

Also, months ago, I was working on a bit of code that I wanted to be easy to read for any programmer, even ones not familiar with C, but I also needed a two-dimensional array. In more trivial cases, that's not so difficult:

int array[10][10]; // write number into (x,y) in array array[y][x] = 10;

Explaining why I put y before x is still more than I want to explain, so I would probably just put them in the wrong order and let it be, but I also needed the array to be of a dynamic size not known until runtime. So I needed this:

int *array = malloc(x_size * y_size * sizeof(int)); // write number into (x,y) in array array[y * x_size + x] = number

Granted, it's not a difficult formula, but with the goal of making the code as accessible as possible, do I really want to throw a formula in there to calculate an index into an array that someone who is only familiar with higher-level languages where they don't have to do this kind of thing has probably never seen and thus won't immediately know what the purpose of it is? It's just a stumbling block in their effort to understand what my code is actually about, as they first have to stop and learn what this irrelevant formula is about.

I ended up wrapping this up in a macro, so I could just explain what the code does next to the macro, so that the reader would realize they can just skip it, and then the macro itself would make the rest of the code easier to read:

// Two-dimensional arrays of size known only at run-time are kinda hard in C. // This macro wraps up the math involved so it's a bit easier for readers who // aren't familiar with having to make these kinds of calculations. // // array(ballots, 12, 7) will access ballot 12, index 7 #define array(name, y, x) (*((name).data + (x) + (y) * (name).dimension_x_size)) struct two_dimensional_array { int *data; int dimension_x_size; int dimension_y_size; }; struct two_dimensional_array ballots = {}; // array containing all ballots

This all has me thinking that C's biggest failure may be a failure to keep size information with pointers.

When you think about what functions a useful string library has, like inserting, finding, or deleting characters from a string, those are operations that would be useful on any array type. I might have an array of integers and want to insert an integer into the middle of the array. However, I can't do that without knowing the length of the array, and without being able to reallocate the array to make it larger. At least with strings, they're defined as ending with a null byte, so you can at least determine the length, but how useful would an "insert an integer" function be if it required the final integer to be 0? Not very useful.

So, imagine we solve the problem with arrays in general. Then we can write that "insert an integer into the middle of an array" function, and that same function will insert a character into a middle of a string. It will also insert a structure into the middle of an array of structures. It would solve every problem at once.

This gave me an idea, one largely inspired by this code which I already showed above:

// insert "number" at position "index" in the array array_count++; easy_memory_allocate(&array_data, array_count * sizeof(*array_data)); memmove(&array_data[index + 1], &array_data[index], (array_count - index - 1) * sizeof(*array_data)); array_data[index] = number;

A neat thing about this code is that we don't know what type of data is in the array. It could be an int or a char or even a structure. No matter what it is, this code works, because it just relies on sizeof(*array_data) to get the size of the data type.

I got into the habit of doing sizeof(*array_data) because, if I do sizeof(int) instead, and later decide to change the type of the array to long or something else, then I have to find all of those sizeof(int) and change them. This means that sizeof(int) is actually kind of incorrect. I actually don't care what the size of an int is, I care what the size of the data pointed to by array_data is, and it's just a coincidence that they were the same when I wrote the line of code. So I basically always do sizeof(*array_data) now, even if I know the data type won't change later, as it just seems like the correct way to do it.

Anyway, it occurred to me that this could be exploited to write macros which perform operations on arrays and work no matter what type the array is composed of. So, it took me a little bit of thinking, but I came up with the following code, which I actually did test, so I know it works.

#include <stdlib.h> #include <stdio.h> #include <string.h> #define easy_declare_array_type(type) \ typedef struct easy_ ## type ## _array { \ int count; \ type *data; \ } easy_ ## type ## _array; #define easy_array_append(pointer, item) \ (pointer)->data = realloc((pointer)->data, ++(pointer)->count * sizeof(*(pointer)->data)); \ (pointer)->data[(pointer)->count - 1] = (item); #define easy_array_insert(pointer, index, item) \ (pointer)->data = realloc((pointer)->data, ++(pointer)->count * sizeof(*(pointer)->data)); \ memmove(&(pointer)->data[index + 1], &(pointer)->data[index], ((pointer)->count - index - 1) * sizeof(*(pointer)->data)); \ (pointer)->data[index] = item; #define easy_array_delete(pointer, index) \ memmove(&(pointer)->data[index], &(pointer)->data[index + 1], ((pointer)->count - index - 1) * sizeof(*(pointer)->data)); \ (pointer)->data = realloc((pointer)->data, --(pointer)->count * sizeof(*(pointer)->data)); \ #define easy_array_free(pointer) \ free((pointer)->data); \ (pointer)->data = NULL; \ (pointer)->count = 0; // declare the array types somewhere globally: easy_declare_array_type(char); easy_declare_array_type(int); void main () { easy_int_array thing = {}; easy_array_append(&thing, 100); easy_array_append(&thing, 200); easy_array_append(&thing, 300); easy_array_delete(&thing, 1); easy_array_insert(&thing, 1, 400); printf("There are %d items in the array.\n", thing.count); for (int i = 0; i < thing.count; i++) { printf("Item %d is %d\n", i, thing.data[i]); }; easy_array_free(&thing); easy_char_array string = {}; easy_array_append(&string, 'h'); easy_array_append(&string, 'e'); easy_array_append(&string, 'l'); easy_array_append(&string, 'l'); easy_array_append(&string, 'o'); string.data[0] += 'A' - 'a'; easy_array_append(&string, 0); printf("The string is '%s'\n", string.data); easy_array_free(&string); };

If you compile this with gcc -o test test.c it does exactly what you'd expect:

There are 3 items in the array. Item 0 is 100 Item 1 is 400 Item 2 is 300 The string is 'Hello'

So there's one set of functions that can work on arrays of any data type. Indeed, even if I didn't have my easy_string_array type already, I could just easy_declare_array_type(easy_string) and then there would be an easy_easy_string_array type available that I could use, and the same functions would allow me to insert or delete strings from the array. ...and, once you get past the macros at the top and into the main() function, the code is fairly high-level and doesn't require much explanation.

This is how easy C should be. I don't know why it has existed for 50 years and still requires people to write their own memmove() and realloc() lines when doing something as basic as working with arrays. Indeed, we had the programming language BASIC a very long time ago and it had this problem solved, and so has every language that has come since, yet even with the very limited functionality of C's string library, we still don't have functions that will guarantee that the result always has a terminating null character, never mind doing anything as advanced as resizing the string's buffer if necessary.

Anyway, as cool as this seems, it still seems very incomplete to me. There's no accounting for two dimensional arrays, for example, other than doing this:

easy_declare_array_type(int); easy_declare_array_type(easy_int_array); void main () { easy_int_array one_d_a = {}; easy_array_append(&one_d_a, 1); easy_array_append(&one_d_a, 2); easy_array_append(&one_d_a, 3); easy_int_array one_d_b = {}; easy_array_append(&one_d_b, 10); easy_array_append(&one_d_b, 20); easy_array_append(&one_d_b, 30); easy_easy_int_array_array two_d = {}; easy_array_append(&two_d, one_d_a); easy_array_append(&two_d, one_d_b); for (int y = 0; y < two_d.count; y++) { for (int x = 0; x < two_d.data[y].count; x++) { printf("Item (%d, %d) is %d\n", x, y, two_d.data[y].data[x]); }; }; };

I don't think this is a syntax I'd enjoy, but it does demonstrate the power of these macros to make an array out of anything, even another array.

Item (0, 0) is 1 Item (1, 0) is 2 Item (2, 0) is 3 Item (0, 1) is 10 Item (1, 1) is 20 Item (2, 1) is 30

So, while I think this idea still needs a lot of work, I feel like I'm on to an interesting idea here.

Comments


If you were logged in, there would be a comment submission form here.
Creating an account is easy. You literally just type in a name and a password.
I don't want your email address, so there won't be any links in any emails to click.

Return to the Summary in Ecstatic Lyrics Blog