8.2 WORDS.TOK Format

by Lance Ewing
Last updated: 31 August 1997
Retrived from the Internet Archive

The WORDS.TOK file is used to store the games vocabulary, i.e. the dictionary of words that the interpreter understands. These words are stored along with a word number which is used by the 'said' test commands as argument values for that command. Many words can have the same word number which basically means that these words are synonyms for each other as far as the game is concerned.

The file itself is both packed and encrypted. Words are stored in alphabetic order which is required for the compression method to work.

THE FIRST SECTION

At the start of the file is a section that is always 26x2 bytes long. This section contains a two byte entry for every letter of the alphabet. It is essentially an index which gives the starting location of the words beginning with the corresponding letter.

Byte	Purpose
0-1	Hi and then Lo byte for 'A' offset.
.....
50-51	Hi and then Lo byte for 'Z' offset.
52-	Words section.

The important thing to note from the above is that the normal Lo-Hi byte order convention used everywhere else in the AGI system is not used here. For example, 0x00 and 0x24 means 0x0024, not 0x2400. This method is used later on for word numbers as well.

All offsets are taken from the beginning of the file. If no words start with a particular letter, then the offset in that field will be 0x0000.

THE WORDS SECTION

Words are stored in a compressed way in which each word will use part of the previous word as a starting point for itself. For example, "forearm" and "forest" both have the prefix "fore". If "forest" comes immediately after "forearm", then the data for "forest" will specify that it will start with the first four characters of the previous word. Whether this method is used for further confusion for would be cheaters or whether it is to help in the searching process, I don't yet know, but it most certainly isn't purely for compression since the WORDS.TOK file is usally quite small and no attempt is made to compress any of the larger files (before AGI version 3 that is).

Prefix

Char.1

Char.2

......

Last Char

WordNum Hi

WordNum Lo

Prefix - Number of characters to include from start of prevous word.
Char.n - 0x7F xor Char.n gives the ASCII code for the character.
Last Char - 0x7F xor (Char.n & 0x7F) gives ASCII code. Top bit is set to indicate end of word.
WordNum Hi - Hi byte of word number.
WordNum Lo - Lo byte of word number.

If a word does not use any part of the previous word, then the prefix field is equal to zero. This will always be the case for the first word starting with a new letter. There is nothing to indicate where the words starting with one letter finish and the next set starts, infact the words section is just one continuous chain of words conforming to the above format. The index section mentioned earlier is not needed to read the words in which suggests that the whole WORDS.TOK format is organised to find words quickly.

A NOTE ABOUT WORD NUMBERS

Some word numbers have special meaning. They are listed below:

Word #	Meaning
0	Words are ignored (e.g. the, at).
1	Anyword. e.g. if (said(take, anyword)) print("You can't - Blackbeard has chopped both your arms off."); }
9999	ROL (Rest Of Line). It does matter what the rest of the input list is.

You can help keep The Sierra Help Pages and its affiliates alive by helping to defray some of the costs of hosting this site. If it has been of help to you, please consider contributing to help keep it online.
Thank you.

Top