Tiny Asm
I’ve finished writing the Tiny CPU assembler, and it works. It took about four hours across two nights to get something with basic functionality. The curious can take a look at the assembler source code for details.
I don’t have much experience with writing these kinds of tools, so my parser is a little ugly. It goes line by line, ignoring whitespace and comments, until it finds a line beginning with a token. This token must either by an instruction mnemonic, or a label. If it’s a mnemonic, a few additional checks determine the operand and address mode, and then a table lookup determines the opcode value for that instruction and address mode combination. If it’s a label, its address is stored, and all previously-pending references to that label are resolved. Anonymous forward and backward labels are also supported.
It would be nice to add features like named constants, conditional compilation, and macros. The assembler also lacks directives for setting the assembly address, or embedded constant data like tables and strings. I’ll add some of those features later, as the need arises.
Read 5 comments and join the conversation5 Comments so far
Leave a reply. For customer support issues, please use the Customer Support link instead of writing comments.
Have you thought about adding a backend to gas — the GNU assembler? That way you’d get all the parsing for free. My apologies if you’ve already considered this and dismissed it.
Here’s a pointer to an old gas internals manual:
http://sunsite.ualberta.ca/Documentation/Gnu/binutils-2.9.1/html_chapter/internals_4.html#SEC11
I wasn’t aware of the GNU assembler– doh!
Taking a quick look at it now, it seems fairly complicated to set up. Maybe it would be clearer if I just took an existing CPU config and modified, but the docs for adding a new CPU were not especially comprehensible.
I’m a little dubious of the whole idea of a generic assembler engine that can be configured to support any syntax and CPU. Sounds a bit like a generic compiler than can be configured to compile C++, Fortran, Perl, or Lisp. I have to guess that many things that should be simple are made more complex by needing to express them in a super-generic way.
I’ll definitely give gas another look when I start running into the limits of my home-made assembler.
Agh, I didn’t mean to be so negative regarding the GNU assembler– mostly just concerned that a big, general solution might not be as good as a small, simple one. I’ll definitely check it out though.
I looked further at the GNU assembler, and got scared. It’s pretty complicated. Just the table of contents for the documentation is hefty. Most of the complexity relates to feature that would be very important to a “real” assembler, like relocatable code and complex object file formats, but which are unnecessarily complicating when considering the small needs of Tiny CPU. I think I wouldn’t have fun trying to retarget gas for Tiny CPU, then.
Looking at this did give me an appreciation of just how much more complicated even a “simple” assembler can get, though. A few things I hadn’t considered when wrote my original feature wish-list above:
– label references can involve expressions, like LOAD DEST+OFF-1
– macros need to take parameters: they’re not just simple text substitution
– macros can reference other macros
– conditional assembly can be used inside macros, testing the parameters
– negative constants need support
– assembly of constant data is more than just string or integer literals
It also made me realize that my naive if..else parsing code probably won’t hold up for more complex use, and I’ll need to do some kind of context-free grammar based parsing.
I’m not too worried about this, since I’ll probably never add half those features anyway, but at least now I have a better picture of how complex assembly can be.
Have you considered using lex + yacc (or bison) instead, and build grammars for your tiny asm? Such grammars will be really simple and you will get features like label and constant expressions in no time. For macros you could use some external macro processor.