ZFS File Format
Over the years I’ve reverse engineered quite a few file formats, but I’ve never really sat down and picked apart why a format was designed the way it was. With that said, I wanted to show the ZFS archive file format and highlight some of the peculiarities I saw and perhaps you guys can answer some of my questions.
For some context, Z-engine was created around 1995 and was used on Macintosh, MS-DOS, and Windows 95.
Format
The main file header is defined as:
struct ZfsHeader { uint32 magic; uint32 unknown1; uint32 maxNameLength; uint32 filesPerBlock; uint32 fileCount; byte xorKey[4]; uint32 fileSectionOffset; };
magic
andunknown1
are self explanatorymaxNameLength
refers to the length of the block that stores a file’s name. Any extra spaces are null.- The archive is split into ‘pages’ or ‘blocks’. Each ‘page’ contains, at max,
filesPerBlock
files fileCount
is total number of files the archive containsxorKey
is the XOR cipher used for encryption of the filesfileSectionOffset
is the offset of the main data section, aka fileLength - mainHeaderLength
The file entry header is defined as:
struct ZfsEntryHeader { char name[16]; uint32 offset; uint32 id; uint32 size; uint32 time; uint32 unknown; };
name
is the file name right-padded with null charactersoffset
is the offset to the actual file dataid
is a the numeric id of the file. The id’s increment from 0 tofileCount
size
is the length of the fileunknown
is self explanatory
Therefore, the entire file structure is as follows:
[Main Header] [uint32 offsetToPage2] [Page 1 File Entry Headers] [Page 1 File Data] [uint32 offsetToPage3] [Page 2 File Entry Headers] [Page 2 File Data] etc.
Questions and Observations
maxNameLength
Why have a fixed size name block vs. null terminated or [size][string]? Was that just the popular thing to do back then so the entire header to could be cast directly to a struct?
filesPerBlock
What is the benefit to pagination? The only explanation I can see atm is that it was some artifact of their asset compiler max memory. Maybe I’m missing something since I’ve never programmed for that type of hardware.
fileSectionOffset
I’ve seen things like this a lot in my reverse engineering; they give the offset to a section that’s literally just after the header. Even if they were doing straight casting instead of incremental reading, a simple sizeof(mainHeader) would give them the offset to the next section. Again, if I’m missing something, please let me know.
Happy coding! :)