edit this message - return to message index

(only moderators may edit messages)

Today I noticed GCC using the fsqrt instruction in a function, rather than calling a sqrt() function which would certainly just use the instruction. It was so much more optimized than I recall seeing in the past that I felt inclined to write a simple function to examine how well it generates code now.

So I wrote this function:

In case you're interested, this is the equation to rotate an (x, y) coordinate around the origin by the angle a. The temporary variable is necessary because both the new x and new y values need to be calculated from both of the original x and y values, and so we can't change either until we've calculated both results.

Here's the code that GCC generated for this function, in NASM syntax because GAS is fucking unreadable:

Wow...

On the one hand, I'm impressed to see that it figured out that all of the sin() and cos() use the same angle, and thus it only needs to calculate the values once. I'm also impressed to see that it realizes that both values can be calculated at once, and calls a sincos() function. It also doesn't utilize a temporary variable, since the values have to be copied into the FPU stack anyway, and so the original x and y values are available even as the old ones are overwritten with the results of the two equations. Even the series of instructions to solve the equations are nicely written.

However, why the fuck is it calling sincos()? Ever since the FPU was introduced it's had a fsincos instruction which does exactly the same thing. Indeed, it isn't even possible to calculate the sine or cosine alone as the fsincos instruction is your only choice and so you have to calculate both values at once. So why is it calling sincos()? It obviously expects that I have an FPU as it has planted the call to this function in the middle of a bunch of FPU instructions, so it can't be expecting that maybe the instruction isn't available. So what the hell?

I can't get over how absurd this is. I feel compelled to write a color-coded example, so here's what the above code does written in plain english:

The code above first sets up the stack as all functions must do upon entry and saves registers according to the calling convention. It then loads the angle into the FPU, and from there stores it onto the CPU stack. It also places onto the CPU stack two pointers to local variables. Then it calls sincos(). In sincos(), the usual stack manipulation will occur, then the angle will be loaded from the CPU stack into the FPU stack. Then the FPU instruction fsincos will be used to calculate the sine and cosine of that angle. Then the sine and cosine will be stored from the FPU stack into the two pointers which were passed as parameters to sincos(). Then sincos() will undo its stack manipulation and return. Now, the code above loads the sine and cosine values into the FPU stack from the local variables they were stored to by the sincos() function. It then loads the x and y values from their pointers into the FPU stack. Finally, the two equations are solved, and the results are written to the pointers given to the function as parameters. Finally, the stack setup is reversed, and the function returns.

The ridiculous thing about this is that everything above in red is the inverse of everything in blue. All of the red and blue can be removed and the exact same fucking thing will happen, but without a bunch of unnecessary movement of data.

...but, whatever. I always knew GCC was bad at compiling math. I'm just thrilled to see that it no longer calls a function every time I use sqrt().

So I wrote this function:

void whatever(double *x, double *y, double a) {

double t;

t = *x * cos(a) - *y * sin(a);

*y = *x * sin(a) + *y * cos(a);

*x = t;

};

In case you're interested, this is the equation to rotate an (x, y) coordinate around the origin by the angle a. The temporary variable is necessary because both the new x and new y values need to be calculated from both of the original x and y values, and so we can't change either until we've calculated both results.

Here's the code that GCC generated for this function, in NASM syntax because GAS is fucking unreadable:

0804a460 <whatever>:

0804a460 55 push ebp

0804a461 89E5 mov ebp,esp

0804a463 56 push esi

0804a464 53 push ebx

0804a465 83EC20 sub esp,byte +0x20

0804a468 8B5D08 mov ebx,[ebp+0x8]

0804a46b 8D45F0 lea eax,[ebp-0x10]

0804a46e DD4510 fld qword [ebp+0x10]

0804a471 8D55E8 lea edx,[ebp-0x18]

0804a474 8B750C mov esi,[ebp+0xc]

0804a477 DD1C24 fstp qword [esp]

0804a47a 8954240C mov [esp+0xc],edx

0804a47e 89442408 mov [esp+0x8],eax

0804a482 E861EBFFFF call dword 0x8048fe8 <sincos@plt>

0804a487 DD45E8 fld qword [ebp-0x18]

0804a48a DD45F0 fld qword [ebp-0x10]

0804a48d DD03 fld qword [ebx]

0804a48f DD06 fld qword [esi]

0804a491 D9C1 fld st1

0804a493 D8CB fmul st3

0804a495 D9C4 fld st4

0804a497 D8CA fmul st2

0804a499 DEC1 faddp st1

0804a49b DD1E fstp qword [esi]

0804a49d D9CB fxch st3

0804a49f DEC9 fmulp st1

0804a4a1 D9C9 fxch st1

0804a4a3 DECA fmulp st2

0804a4a5 DEE1 fsubrp st1

0804a4a7 DD1B fstp qword [ebx]

0804a4a9 83C420 add esp,byte +0x20

0804a4ac 5B pop ebx

0804a4ad 5E pop esi

0804a4ae 5D pop ebp

0804a4af C3 ret

Wow...

On the one hand, I'm impressed to see that it figured out that all of the sin() and cos() use the same angle, and thus it only needs to calculate the values once. I'm also impressed to see that it realizes that both values can be calculated at once, and calls a sincos() function. It also doesn't utilize a temporary variable, since the values have to be copied into the FPU stack anyway, and so the original x and y values are available even as the old ones are overwritten with the results of the two equations. Even the series of instructions to solve the equations are nicely written.

However, why the fuck is it calling sincos()? Ever since the FPU was introduced it's had a fsincos instruction which does exactly the same thing. Indeed, it isn't even possible to calculate the sine or cosine alone as the fsincos instruction is your only choice and so you have to calculate both values at once. So why is it calling sincos()? It obviously expects that I have an FPU as it has planted the call to this function in the middle of a bunch of FPU instructions, so it can't be expecting that maybe the instruction isn't available. So what the hell?

I can't get over how absurd this is. I feel compelled to write a color-coded example, so here's what the above code does written in plain english:

The code above first sets up the stack as all functions must do upon entry and saves registers according to the calling convention. It then loads the angle into the FPU, and from there stores it onto the CPU stack. It also places onto the CPU stack two pointers to local variables. Then it calls sincos(). In sincos(), the usual stack manipulation will occur, then the angle will be loaded from the CPU stack into the FPU stack. Then the FPU instruction fsincos will be used to calculate the sine and cosine of that angle. Then the sine and cosine will be stored from the FPU stack into the two pointers which were passed as parameters to sincos(). Then sincos() will undo its stack manipulation and return. Now, the code above loads the sine and cosine values into the FPU stack from the local variables they were stored to by the sincos() function. It then loads the x and y values from their pointers into the FPU stack. Finally, the two equations are solved, and the results are written to the pointers given to the function as parameters. Finally, the stack setup is reversed, and the function returns.

The ridiculous thing about this is that everything above in red is the inverse of everything in blue. All of the red and blue can be removed and the exact same fucking thing will happen, but without a bunch of unnecessary movement of data.

...but, whatever. I always knew GCC was bad at compiling math. I'm just thrilled to see that it no longer calls a function every time I use sqrt().

...but GCC is pretty good at integer code. - Pj - 4/5/12

FORTRAN is the answer - crunge - 4/5/12

-ffast-math, fesetround(), and nearbyint() - Pj - 4/6/12

Again, I rescind my previous statement. - Pj - 12/19/12

It seems that awful cos() bug has been fixed. - Pj - 10/8/13