Sarcasm Z80 Assembler
General Information
Sarcasm is a Z80 assembler written in Perl. Perhaps its best feature
is that it is totally awesome. Second to that would be... ...yes,
definately the multiple instructions on a line seperated by semicolons.
If that idea turns you off, you'd best go elsewhere because there's
a lot more where that came from. Can you say "pays more attention to
whitespace than commas" without having a seizure? It truly is an
awesome assembler, even if it seems that nineteen out of twenty people
are completely uninterested after reading this page.
Download
The current version, 2016-05-05, is available in both
ZIP and
TGZ formats,
and is also available as
a bunch of files.
Note: This latest release of Sarcasm contains a significant change to how labels are specified. (A colon is now required, whereas in previous versions, using a colon would have been a syntax error.) Because of this, if you are compiling old code and don't want to have to update it, you'll want to download the previous version. Note that this is essentially the only change in this release, so there's no reason to update if you don't feel like changing all of your code at the moment.
The previous version, 2014-12-03, is also available in both ZIP and TGZ formats, as well as a bunch of files.
Archival copies of even older releases are available in the download area just in case you feel like using something with known bugs.
Windows users will likely need something like Strawberry Perl so that they can execute Perl scripts. Linux and FreeBSD users can likely use Sarcasm without any additional software.
Note: This latest release of Sarcasm contains a significant change to how labels are specified. (A colon is now required, whereas in previous versions, using a colon would have been a syntax error.) Because of this, if you are compiling old code and don't want to have to update it, you'll want to download the previous version. Note that this is essentially the only change in this release, so there's no reason to update if you don't feel like changing all of your code at the moment.
The previous version, 2014-12-03, is also available in both ZIP and TGZ formats, as well as a bunch of files.
Archival copies of even older releases are available in the download area just in case you feel like using something with known bugs.
Windows users will likely need something like Strawberry Perl so that they can execute Perl scripts. Linux and FreeBSD users can likely use Sarcasm without any additional software.
Documentation
Information
Sarcasm is what I call a "search and replace" assembler.
Many years ago I wanted to write an assembler, but the thought of writing hundreds of lines of code to compare each opcode to string constants, then dozens more to compare each operand to string constants, then bunches of code to calculate the bytes that represent that opcode and those operands, seemed totally unlike anything I was interested in doing. ...but eventually I had an idea: What if I just made a file that contained every possible instruction one might type, and the byte sequence that represents that instruction? Then the assembler could be dumb and just look up the answers in a table. That sounded like a much easier programming challenge, one which I might actually complete.
Sure enough, I did complete it, because now we have Sarcasm the Z80 Assembler. There's a file that comes with it named "opcodes.txt" which isn't simply a list of opcodes it accepts, but rather, it's actually a part of Sarcasm and tells it what byte sequences to generate for each instruction. Once Sarcasm has cleaned up the formatting of your source code, and accounted for all of your labels and any directives, it just does a search-and-replace on what's left over. If you type "ld a [label]" it finds a line that reads "3A ld a [xxxx]" in opcodes.txt and subsequently knows that, once it calculates the value of your label, it just needs to output byte "3A" followed by the two byte representation of your label and it's done.
Many years ago I wanted to write an assembler, but the thought of writing hundreds of lines of code to compare each opcode to string constants, then dozens more to compare each operand to string constants, then bunches of code to calculate the bytes that represent that opcode and those operands, seemed totally unlike anything I was interested in doing. ...but eventually I had an idea: What if I just made a file that contained every possible instruction one might type, and the byte sequence that represents that instruction? Then the assembler could be dumb and just look up the answers in a table. That sounded like a much easier programming challenge, one which I might actually complete.
Sure enough, I did complete it, because now we have Sarcasm the Z80 Assembler. There's a file that comes with it named "opcodes.txt" which isn't simply a list of opcodes it accepts, but rather, it's actually a part of Sarcasm and tells it what byte sequences to generate for each instruction. Once Sarcasm has cleaned up the formatting of your source code, and accounted for all of your labels and any directives, it just does a search-and-replace on what's left over. If you type "ld a [label]" it finds a line that reads "3A ld a [xxxx]" in opcodes.txt and subsequently knows that, once it calculates the value of your label, it just needs to output byte "3A" followed by the two byte representation of your label and it's done.
General Syntax
Sarcasm, like most non-ancient programming languages, uses the
However, unlike every other programming language, it pays no attention to commas whatsoever. This came about as I was writing the parser and somehow found looking for commas and spitting out error messages when they weren't present to be an insurmountable pain in the ass. So I decided there would be no commas. However, an unavoidable habit of typing commas in assembly code eventually forced me to make Sarcasm accept them, but they're treated as being no different from spaces; Sarcasm doesn't care whether you use them or where you put them as they simply aren't part of its syntax.
Sarcasm also uses square brackets
Labels are delcared with a
#
symbol for comments, such that, also like most non-ancient programming languages, it can use the ;
symbol to separate multiple instructions on a single line.
However, unlike every other programming language, it pays no attention to commas whatsoever. This came about as I was writing the parser and somehow found looking for commas and spitting out error messages when they weren't present to be an insurmountable pain in the ass. So I decided there would be no commas. However, an unavoidable habit of typing commas in assembly code eventually forced me to make Sarcasm accept them, but they're treated as being no different from spaces; Sarcasm doesn't care whether you use them or where you put them as they simply aren't part of its syntax.
Sarcasm also uses square brackets
[
and ]
instead of parenthesis (
and )
for dereferencing pointers. I can't say there's more reason to this other than that it's what I'm used to from having used NASM for many years, and parenthesis are for math. Of course, Sarcasm won't recognize parenthesis in math, which makes it all the more strange that it also won't recognize them for dereferencing pointers, but if you continue reading you'll realize this is relatively meaningless in the clusterfuck of how Sarcasm isn't like other Z80 assemblers.
Labels are delcared with a
:
suffix, which functions just like a ;
, but which also indicates that what preceeds it is a label. There's much more information about labels in the section documenting the namespace directive.
Directives
range [name] [lowest address] [highest address]
This directive allows you to define and name a range of memory. Memory ranges allow you to tell Sarcasm where code and data will exist in Z80 memory. The also allow Sarcasm to warn you if you add too much code or data and end up outside of the range of memory you wanted your code or data to exist in.
Examples:
...or perhaps something more elaborate...
Examples:
range rom $0000 $7FFF range ram $8000 $FFFF
range start $0000 $0037 range int $0038 $0065 range nmi $0066 $007F range code $0080 $3FFF range data $4000 $7FFF range ram $8000 $EFFF range stack $F000 $FFFF
section [name]
This directive tells Sarcasm which memory range you want the following code or data to be assembled into. Sarcasm maintains a separate "address pointer" for each memory range so that you can switch back and forth between them without accidentally overwriting code previously added to the section.
Example:
In this above example, despite being interspersed within the code instructions in the source file, the message strings will be separated into the 'data' section of the generated ROM file, and the two pieces of code within the 'code' section will be contiguous, as shown in this hex code dump:
At this point you probably feel like you know everything there is to know about how to use Sarcasm, but keep reading as there are seven more directives!
Example:
range code $0080 $3FFF range data $4000 $7FFF section code ld hl message_1 call print_message section data message_1: data "This is message #1." $00 section code ld hl message_2 call print_message section data message_2: data "This is message #2." $00 section code print_message: # dummy label for non-existant function output test.rom $0000 $7FFF
$ hexdump -C test.rom 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 21 00 40 cd 8c 00 21 14 40 cd 8c 00 00 00 00 00 |!.@...!.@.......| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00004000 54 68 69 73 20 69 73 20 6d 65 73 73 61 67 65 20 |This is message | 00004010 23 31 2e 00 54 68 69 73 20 69 73 20 6d 65 73 73 |#1..This is mess| 00004020 61 67 65 20 23 32 2e 00 00 00 00 00 00 00 00 00 |age #2..........| 00004030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000
output [filename] [lowest address] [highest address]
Sarcasm internally keeps a 64 kB buffer which it assembles all instructions and data into. This directive tells it to write a portion of that memory to a file. You can use this directive multiple times to create multiple output files. The data written to the file is only what is in that 64 kB memory buffer at the time this directive is encountered, and so you generally want this directive to appear as the last line of your source code. If Sarcasm never encounters an
output
directive, it displays an error message.
goto [address]
This directive simply changes the "address pointer" that Sarcasm writes code or data to within the current range/section. The specified address must be within the current range/section, otherwise an error message is generated.
Example:
...which outputs this ROM file...
Example:
range test $0080 $3FFF section test data "one" goto $1000; data "two" goto $0800; data "three" output test.rom $0000 $7FFF
$ hexdump -C test.rom 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000080 6f 6e 65 00 00 00 00 00 00 00 00 00 00 00 00 00 |one.............| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000800 74 68 72 65 65 00 00 00 00 00 00 00 00 00 00 00 |three...........| 00000810 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 74 77 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 |two.............| 00001010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000
bytes [list of byte values]
words [list of word values]
These directives insert binary numbers into the assembly output. The
Example:
...which outputs this ROM file...
bytes
directive accepts only bytes and the words
directive accepts only words. Each may be decimal numbers, hexadecimal numbers (prefixed with a $
symbol), code labels, or simple arithmetic (addition and subtraction) involving any of those types of numbers.
Example:
range test $4000 $7FFF section test # Remember, commas are unnecessary and are effectively spaces, # and are included only to make the statements easier to read. bytes 1, 2, 1+2 words $4567, random_label, random_label+$10 random_label: data "random_label is here" output test.rom $0000 $7FFF
$ hexdump -C test.rom 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00004000 01 02 03 67 45 09 40 19 40 72 61 6e 64 6f 6d 5f |...gE.@.@random_| 00004010 6c 61 62 65 6c 20 69 73 20 68 65 72 65 00 00 00 |label is here...| 00004020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000
data [list of data items]
This directive inserts binary data into the assembly output. It supports four data types: bytes, words, text strings, and byte strings. To differentiate between bytes and words, each is required to be exactly two or four hexadecimal digits in length. Text strings may be enclosed in single or double quotation marks. Byte strings are a string of hexadecimal digits prefixed with a
Example:
...which outputs this ROM file...
!
symbol, an even number of digits in length, which will be stored in "big endian" order, a.k.a. the order in which you type the various bytes within the string.
Example:
range test $4000 $7FFF section test # The data directive allows easy mixing of data types: data $01 $0203 "Four" !12345678 !DECADE # Text strings can be in multiple formats: data "This is a text string." data 'This is also a string.' output test.rom $0000 $7FFF
$ hexdump -C test.rom 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00004000 01 03 02 46 6f 75 72 12 34 56 78 de ca de 54 68 |...Four.4Vx...Th| 00004010 69 73 20 69 73 20 61 20 74 65 78 74 20 73 74 72 |is is a text str| 00004020 69 6e 67 2e 54 68 69 73 20 69 73 20 61 6c 73 6f |ing.This is also| 00004030 20 61 20 73 74 72 69 6e 67 2e 00 00 00 00 00 00 | a string.......| 00004040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00008000
replace [name] [replacement text]
This directive replaces "name" with the "replacement text" every time it is encountered in the source file. This allows you to create named constants, so that frequently used numbers can be easily changed simply by changing one line in the source code rather than dozens.
Unfortunately, these statements are parsed only after Sarcasm has reformatted all of the code for easy parsing, and so you can't do any arithmetic with these substutions as Sarcasm won't understand what it sees, and correcting this is seems beyond my ability at the moment. So you'll have to define a name for each constant, like this...
...which really isn't a big deal in my opinion. It'd be nice if we could just define "port" as "$60" and then use "port+1" and "port+2" for the other two ports, but defining three separate constants is more in keeping with the purpose of named constants anyway. What if port_b becomes "port+5" in the future? Do you want to search your code for instances of "+2" and replace them with "+5" or do you want to just change the definition of "port_b" and be done?
Unfortunately, these statements are parsed only after Sarcasm has reformatted all of the code for easy parsing, and so you can't do any arithmetic with these substutions as Sarcasm won't understand what it sees, and correcting this is seems beyond my ability at the moment. So you'll have to define a name for each constant, like this...
range test $0080 $3FFF replace port_a $60 replace port_b $61 replace port_c $62 section test ld a $00; out port_a ld a $20; out port_b ld a $FC; out port_c output test.rom $0000 $7FFF
namespace [name]
This directive is perhaps the most difficult to explain, largely because Sarcasm features two levels of local variables.
In most assemblers, you'll create labels, and those labels will be accessible only within the source file you create them in, unless you export the labels with an "export" directive and then import them in another file with a "global" directive. Sarcasm similarly allows one source file to avoid polluting the namespace of another source file, but without the "export" and "global" directives.
In Sarcasm, each source file exists in its own namespace. By default, the name of this namespace is the name of the source file minus its file extention. If, for example, you have a file named "tree.asm" and it contains a label "leaf", you can access that label from other source files by typing "tree.leaf" so that Sarcasm knows to look in the "tree" namespace for the "leaf" label.
However, if you aren't happy with this default name for the namespace, you can specify your own with the
...but we're not done yet. Labels in Sarcasm are kind of complex. It's a good thing, as assembly language is full of so many labels and having to rename them all every time you copy and paste a loop gets old real quick, and these complex label features help with that, but they're hard to explain.
Perhaps the best I can do is with this example code:
Well, I hope that explains it anyway.
In most assemblers, you'll create labels, and those labels will be accessible only within the source file you create them in, unless you export the labels with an "export" directive and then import them in another file with a "global" directive. Sarcasm similarly allows one source file to avoid polluting the namespace of another source file, but without the "export" and "global" directives.
In Sarcasm, each source file exists in its own namespace. By default, the name of this namespace is the name of the source file minus its file extention. If, for example, you have a file named "tree.asm" and it contains a label "leaf", you can access that label from other source files by typing "tree.leaf" so that Sarcasm knows to look in the "tree" namespace for the "leaf" label.
However, if you aren't happy with this default name for the namespace, you can specify your own with the
namespace
directive. Additionally, if you want, you can have multiple namespaces within a single file, and switch back and forth between them as much as you like.
...but we're not done yet. Labels in Sarcasm are kind of complex. It's a good thing, as assembly language is full of so many labels and having to rename them all every time you copy and paste a loop gets old real quick, and these complex label features help with that, but they're hard to explain.
Perhaps the best I can do is with this example code:
range test $0000 $7FFF section test namespace one apple: data "apple in one" peach: data "peach in one" namespace two apple: data "apple in two" peach: data "peach in two" # The identical labels are allowed because each is in a seperate namespace. # When you use a label, the current namespace is used if one is not specified: namespace one ld hl apple # loads address of "apple in one" ld hl one.apple # loads address of "apple in one" ld hl two.apple # loads address of "apple in two" # However, to complicate things further, there are also sub-labels: namespace three apple: .skin; data "apple.skin in three" .core; data "apple.core in three" pear: .skin; data "pear.skin in three" .core; data "pear.core in three" # Sub labels can only be accessed in short form until a new label is declared. ld hl .skin # loads address of "pear.skin in three" ld hl .core # loads address of "pear.core in three" orange: .skin; data "orange.skin in three" # Once you declare a new label, you need to specify sub-labels as such: ld hl apple.skin # loads address of "apple.skin in three" # ...or if you are in a different namespace... namespace four ld hl three.apple.skin # loads address of "apple.skin in three" # So you might ask, what if you have this: namespace xxx yyy: data "xxx.yyy" .zzz; data "xxx.yyy.zzz" namespace whatever xxx: data "whatever.xxx" .yyy; data "whatever.xxx.yyy" # ...and then you do something like this... ld hl xxx.yyy # Is that 'yyy' in the 'xxx' namespace, or # is it 'xxx.yyy' in the 'whatever' namespace? # Well, the answer is that it is "whatever.xxx.yyy" so long as you are in # the 'whatever' namespace, as Sarcasm prefers the more local match.
Opcodes
Sarcasm's opcode's aren't identical to typical Z80 opcode syntax you'll
see elsewhere on the internet. I'll try to document all of the
differences here.
IX, IY, IXL, IXH, IYL, IYH
The Z80 had such a nice scheme going on with its register names until those undocumented registers came along. Byte registers were one letter and word registers were two letters. Now we have three letter registers that aren't 24-bit registers? ...and it was so cool that the two 8-bit halves of HL were H and L.
IX
in Sarcasm's syntax is ST
, and its 8-bit halves are S
and T
IY
in Sarcasm's syntax is UV
, and its 8-bit halves are U
and V
RST, BIT, SET, RES, and IM
The way that Sarcasm works doesn't allow for instructions to include numbers as operands when those numbers don't become bytes in the machine code. For this reason, these instructions have a different format in Sarcasm.
RST $38
in Sarcasm's syntax is rst38
BIT 3, A
in Sarcasm's syntax is bit3 a
SET 3, (IX+7), A
in Sarcasm's syntax is set3 a [st+7]
IM 1
in Sarcasm's syntax is im1
EX AF, AF'
As Sarcasm uses the apostrophe as a quotation mark, having a single quote in an instruction just isn't possible. As such, this instruction in Sarcasm syntax is merely
ex af
with no second operand.
JP (HL)
This instruction's syntax is just the result of someone's confusion. As written in typical Z80 assembly syntax, one would think that it reads an address from the memory pointed to by HL and then jumps to that address, but in reality it simply jumps to the address stored in HL. To avoid this unnecessary confusion, in Sarcasm the syntax of this instruction is simply
jp hl
so that it looks like what it does.
RLC, RRC, RL, RR, RLCA, RRCA, RLA, RRA
These just confuse the fuck out of me. I'm far too used to seeing "C" as a symbol for the carry flag, but the ones with "C" in them are the ones that don't rotate through the carry flag. Then it becomes even more confusing when you realize that the "C" stands for "circular" and so you start thinking that the opcodes without a "C" in the name don't rotate the bits but instead simply shift them.
I think Intel 8086 assembly named these instructions much better, and so I've adopted those opcode names for Sarcasm:
As for the versions with "A" appended on the end, which generate one-byte opcodes instead of two-byte opcodes, in Sarcasm you just use the instruction without an operand to generate those.
I think Intel 8086 assembly named these instructions much better, and so I've adopted those opcode names for Sarcasm:
RLC
and RLCA
in Sarcasm's syntax are rol
, a.k.a. "rotate left"RRC
and RRCA
in Sarcasm's syntax are ror
, a.k.a. "rotate right"RL
and RLA
in Sarcasm's syntax are rcl
, a.k.a. "rotate carry left"RR
and RRA
in Sarcasm's syntax are rcr
, a.k.a. "rotate carry right"
As for the versions with "A" appended on the end, which generate one-byte opcodes instead of two-byte opcodes, in Sarcasm you just use the instruction without an operand to generate those.
CPL
I just couldn't remember CPL for the life of me. So I swapped it out with the 8086 opcode name.
CPL
in Sarcasm's syntax is not
, like the logic gate.
SCF, CCF
"Set carry flag" and "clear carry flag?" No, fuck you, it's "compliment carry flag." Bullshit like this causes me to waste days debugging code, so I went with the less ambiguous 8086 opcodes:
Tossing that "m" in there makes it so much less ambiguous.
SCF
in Sarcasm's syntax is stc
, "set carry"CCF
in Sarcasm's syntax is cmc
, "compliment carry"
Tossing that "m" in there makes it so much less ambiguous.
SLA, SRA, SLL, SRL
Well, fuck if SLL isn't a useless instruction made to resemble a useful instruction.
These two instructions are useful for scaling unsigned numbers.
These two instructions are useful for scaling signed numbers.
Finally,
SLA
in Sarcasm's syntax is SHL
a.k.a. "shift left"SRL
in Sarcasm's syntax is SHR
a.k.a. "shift right"These two instructions are useful for scaling unsigned numbers.
SLA
in Sarcasm's syntax is also SAL
a.k.a. "shift arithmetic left"SRA
in Sarcasm's syntax is SAR
a.k.a. "shift arithmetic right"These two instructions are useful for scaling signed numbers.
Finally,
SLL
in Sarcasm's syntax is SIL
a.k.a. "shift illogically left", to reflect the fact that what it does doesn't make a damn bit of sense and so you probably shouldn't be using it.
ADD, ADC, SUB, SBC, AND, XOR, OR, CP
The 8-bit versions of these instructions take only one operand, as the destination register is always the
A
register. Seeing a register as an operand always causes me to assume I can specify a different register, which just wastes time when I rewrite code, attempt to compile, then I'm reminded that I don't actually have a choice, and so I have to restore the code to its original version.
IN, OUT
Similarly, you only get to choose one of the two operands to these instructions, so I dropped the non-optional operands.
IN A, $A5
in Sarcasm's syntax is in $A5
IN A, (C)
in Sarcasm's syntax is in a
IN C, (C)
in Sarcasm's syntax is in c
IN (C)
in Sarcasm's syntax is in
OUT (C), 0
in Sarcasm's syntax is out
OTIR, OTDR
Tossing the U out of OUT just to avoid having a five letter opcode is dumb as fuck. Just like Perl's "elsif" which just makes me want to break someone's head open. Well, maybe not break someone's head open, but it definately makes me want to cry. Why create countless typos just to avoid typing one fucking letter?
OTIR
in Sarcasm's syntax is outir
OTDR
in Sarcasm's syntax is outdr
Contact Information
Feel free to send comments and questions to my email address.
As far as I know, only five people use Sarcasm, so it isn't as if I'm swamped with
email or anything. In fact, learning that someone else uses Sarcasm would make me
ecstatic. ...and might even convince me to make it even more awesome.
Also, since you're clearly interested in Z80 stuff, you might want to have a look at my Z80 EEPROM Programmer or my Z80 System Design as they're awesome too. Especially that EEPROM programmer, it's shit-your-pants awesome.
Also, since you're clearly interested in Z80 stuff, you might want to have a look at my Z80 EEPROM Programmer or my Z80 System Design as they're awesome too. Especially that EEPROM programmer, it's shit-your-pants awesome.