SimCity 2000 DOS Data Formats ::

SimCity 2000 DOS Data Formats::02.28.2015+18:15

(I’ve been meaning to write this up for a while.) Around a year and a half ago I was bored and felt like digging around in some game engines because it’s interesting to see how people have solved various problems, what formats they use, and also what libraries they use. I ended up focusing on SimCity 2000 for DOS because it’s pretty old and I’m not familiar with the limitations of DOS programming. I’m going to include bits of my thought process, so feel free to skim if you want spoilers.

The DAT File

Understanding the SC2000.DAT file is the meat of this post. The GOG version of the game also includes a SC2000SE.DAT file. This is actually a modified ISO of what’s on the Special Edition CD-ROM (it doesn’t include the Windows version, sadly). ISOs are boring and very documented, so we’ll ignore it.

After opening up the file in a hex editor, I noticed that there was no header (lack of any identifying words/bytes) and a large portion of the beginning of the file seemed to have a uniform format. Basically, some letters (which looked like filenames) and two shorts; clearly it was an index of some sort. This was a DOS game, so the filenames were all in 8.3 format, which put them at 12 bytes each. They were not C strings, making extracting the index a lot easier. The format is exactly as follows:

struct Entry {
    char filename[12];
    uint16_t someNumber;
    uint16_t otherNumber;

I scrubbed the file, looking for some indication of how many entries there were in the index, and as far as I can tell there’s nothing to explicitly tell the game that. While writing this post, however, I came to the realization that you can calculate the number of entries from the first entry in the index (more on that later). At the time, I just hardcoded how many files there were in the short program I wrote to dump the contents (a nearly 20-year old game isn’t likely to change).

The next important bit was understanding what the the two numbers after the filename meant. My initial guess was that maybe they were the size and offset of the file in the archive. The first number looked plausibly enough like it could be size, but the second number was confusing. It was really small (0 for the first couple entries), only ever increased, and was the same for a bunch of consecutive entries. I added up the first number for all of the entries and ended up with something much smaller than the 2.5mb that the file is. I was wrong on both counts.

My next guess about the second number was that it was some sort of block number. One might think that it was just the 20-bit addressing scheme of segment:offset. That’s not right for a number of reasons:

  1. 20-bit addressing only handles one megabyte of memory
  2. The data file is 2.5mb
  3. 20-bit addressing segments are only 16-bits each. The potential offset values were much larger than that.
If the first number wasn’t a size, perhaps it was an offset of some sort. The first index entry’s offset would then be the length of the index. This turned out to be true, and this is how you can calculate the number of index entries (just divide the offset by 16). So then, what was the second number? I tried to find the start points of the various files in order to get some landmarks that I could use to solve for whatever that second value was. As it turns out, the second number is the 64k block that that file starts in and the offset is the offset from the start of that block. The file’s start position is then: offset + (block * 64 * 1024).

The final file entry structure looks like this:

struct Entry {
    char filename[12];
    uint16_t offset;
    uint16_t block;

Dumping the Contents of the DAT

Now that I’d figured out the format, I needed to dump the files. The DAT is tightly packed, so you don’t have to worry about alignment or anything like that. Dumping each file is basically just slicing out the bytes from the beginning offset until the offset of the next file (or the end of the DAT if you’re on the last entry). The code I wrote to do this is trivial, so this is left as an exercise for the reader.

What’s Inside

Part of my initial motivation was getting at the tasty music files inside the archive, so I was hoping they were in a sane, somewhat standard format and not something like an XM or MOD file that had been stripped and rewritten into some other binary format or something similarly custom. As luck would have it, they’re run-of-the-mill XMI files which can be easily converted to MID.

The file formats inside of the DAT are (in no particular order):

For the purposes of not running long, I’m not going to delve into the non-“standard” formats here. Maybe I’ll dig in and document them and the SCURK formats at some later point.


I hope this was as interesting to read as it was for me to discover. My biggest unanswered question at this point is why the index doesn’t use a 32-bit unsigned int for the offset from the start of the file. I’ve fumbled around the Watcom C/C++ docs, and I can’t find anything to shed light on this (the game uses DOS4/GW, which was distributed with Watcom). The DOS4/G docs are behind a $49 paywall and I’m not that interested in finding out the answer.

go back
Home | About | Contact