I eventually realized that since I don't care about the actual distances, but only what is further than what, that I don't actually need the sqrt(), as comparing the squared values will work just as well.
That allowed me to reduce it to purely integer math, which cut the C version down from 16 ms to 1.7 ms.
So I did the same thing with my assembly code, but it still took 2.1 ms. Checking out GCC's code, I saw that it made an optimization by calculating dx outside of the inner loop. Clever GCC, I hadn't thought of that. So I adjusted my code accordingly. It still seems my code is a bit slower, but it's hard to judge since there's some variability in timing between runs. Averaging more trials doesn't help because for some reason the variability is constant per invocation of the test program. It'll do 100 runs of 2.1 ms, then I immediately run the program again and see it do 100 runs of 1.8 ms. It must be ending up in different memory or something, I guess.
Anyway, after spending hours upon hours debugging the assembly code I wrote for the sort function, I now have an assembly language version that runs in 63 ms vs. GCC's version which runs in 53 ms. I didn't even bother to examine GCC's code this time, as I've wasted enough hours on my assembly language version already. So fuck it. GCC wins, both in speed and in reduced effort in writing and debugging the code.
I just wish it would compile floating point stuff more efficiently than it does. I'm so tempted to create something which you put a set of formulas into and it generates an assembly language stream of floating point instructions that you can compile to perform those calculations. It'd be like, a compiler, but only for math.
As for the minecraft server, I don't know that I want to blow 61 ms on this. I think I'll just keep thinking about it for a while and see if I can think of a better way to do it. (Like maybe writing a server for my own game, which doesn't have stupid lag problems, making all of this irrelevant since I can just send the block changes when they occur.)