Difference between revisions of "AMOS:Sourcecode file format"
Spellcoder (talk | contribs) (Article by Kyzer) |
(No difference)
|
Latest revision as of 00:11, 8 March 2008
AMOS source code is normally stored in a file with the extension ".AMOS". It begins with 16 bytes of ASCII text from the following list:
Text | Tested? | Saved from which AMOS? |
---|---|---|
"AMOS Pro101V\0\0\0\0" | Yes | AMOS Professional |
"AMOS Pro101v\0\0\0\0" | No | AMOS Professional |
"AMOS Basic V134 " | Yes | AMOS Pro, but AMOS 1.3 compatible |
"AMOS Basic v134 " | No | AMOS Pro, but AMOS 1.3 compatible |
"AMOS Basic V1.3 " | Yes | AMOS The Creator v1.3 |
"AMOS Basic v1.3 " | No | AMOS The Creator v1.3 |
"AMOS Basic V1.00" | Yes | AMOS The Creator v1.0 - v1.2 |
"AMOS Basic v1.00" | No | AMOS The Creator v1.0 - v1.2 |
As can be seen from the table, the 12th character in the text is either "V", which means "tested", or "v", which means "not tested". "Tested" in this case refers to whether the AMOS interpreter has performed a syntax check on all lines of code, and found no syntax errors. While you can save AMOS source code to disk at any time, you can only run it or compile it if it has been tested first.
After the 16 byte header is a 4-byte 32-bit unsigned integer stating the number of bytes of tokenised BASIC code. This is immediately followed by the BASIC code itself, for the length given.
Finally, after the BASIC code, a 4-bytes ASCII identifier "AmBs" is given, followed by a 2-byte 16-bit unsigned integer with the number of memory banks to follow. This is followed by the banks themselves, individually sized. Each bank can either be a sprite bank, an icon bank or a regular memory bank. There is no more data in the source code file after this. If a sprite bank is given, it always occupies bank 1 and there must not be another sprite bank or regular memory bank with a bank number of 1. If an icon bank is given, it always occupies bank 2 and there must not be another icon bank or regular memory bank with a bank number of 2.
Tokenised BASIC code format
The tokenised BASIC code is a stream of tokenised lines. Each tokenised line has the following format:
- 1 byte: The length of this line in words (2 bytes), including this byte. To get the length of the line in bytes, double this value.
- 1 byte: The indent level of this line. AMOS automatically indents lines to show program structure. If printing this line as ASCII text, you should print {indent level + 1} space characters as the beginning of the line, or no spaces if the value is less than 2.
- many bytes: a sequence of tokens. Each token is at least two bytes, and all tokens are rounded to to a multiple of two bytes. Each token is individually sized. The tokens always end with a compulsory null token.
AMOS considers each token as a signed 16-bit number. Token values between 0x0000 and 0x004E are special printing and have differing sizes, all others are simply a signed offset into AMOS's internal token table. The text of the token in the internal token table is what should be printed. Some of these tokens have special size rules, all others are 2 bytes in size.
Specially printed tokens
Token | Type | Interpretation |
---|---|---|
0x0000 | null token | Marks the end of line. Always 2 bytes long. |
0x0006 | Variable reference, e.g. Print XYZ |
The ASCII string is null terminated and its length is rounded up to a multiple of two. |
0x000C | Label, e.g. XYZ: or 190 at the start of a line | |
0x0012 | Procedure call reference, e.g. XYZ["hello"] | |
0x0018 | Label reference, e.g. Goto XYZ | |
0x0026 | String with double quotes, e.g. "XYZ" |
|
0x002E | String with single quotes, e.g. 'XYZ' | |
0x001E | Binary integer value, e.g. %100101 |
|
0x0036 | Hexidecimal integer value, e.g. $80FAA010 | |
0x003E | Decimal integer value, e.g. 1234567890 | |
0x0046 | Floating point value, e.g. 3.1452 |
|
0x004E | Extension command |
|
Specially sized tokens
Token | Type | Interpretation |
---|---|---|
0x064A | Rem |
Print the remark string in addition to the remark token.
The ASCII string is null terminated and its length is rounded up to a multiple of two. |
0x0652 | Rem type 2 | |
0x023C | For |
|
0x0250 | Repeat | |
0x0268 | While | |
0x027E | Do | |
0x02BE | If | |
0x02D0 | Else | |
0x0404 | Data | |
0x0290 | Exit If |
|
0x029E | Exit | |
0x0316 | On | |
0x0376 | Procedure |
|
Procedure decryption source code
If you should find a procedure (0x0376) token with the "is encrypted" bit set, run this C function on the code and it will decrypt the contents of the procedure.
<code><pre>
/* fetches a 4-byte integer in big-endian format */
#define EndGetM32(a) ((((a)[0])<<24)|(((a)[1])<<16)|(((a)[2])<<8)|((a)[3]))
/* fetches a 2-byte integer in big-endian format */
#define EndGetM16(a) ((((a)[0])<<8)|((a)[1]))
void decrypt_procedure(unsigned char *src) {
unsigned char *line, *next, *endline;
unsigned int key, key2, key3, size;
/* ensure src is a pointer to a line with the PROCEDURE token on it */
if (EndGetM16(&src[2]) != 0x0376) return;
/* do not operate on compiled procedures */
if (src[10] & 0x10) return;
/* size+8+6 is the start of the line after ENDPROC */
size = EndGetM32(&src[4]);
endline = &src[size+8+6];
line = next = &src[src[0] * 2];
/* initialise encryption keys */
key = (size << 8) | src[11];
key2 = 1;
key3 = EndGetM16(&src[8]);
while (line < endline) {
line = next;
next = &line[line[0] * 2];
/* decrypt one line */
for (line += 4; line < next;) {
*line++ ^= (key >> 8) & 0xFF;
*line++ ^= key & 0xFF;
key += key2;
key2 += key3;
key = (key >> 1) | (key << 31);
}
}
src[10] ^= 0x20; /* toggle "is encrypted" bit */
}
</pre></code>