HPI File Format Document version 1.4 I figured some of this stuff out by disassembling WriteHPI. All hail Eric DeZert. I'd also like to thank Jesse Michael for his clear and concise explanation of the compression scheme (which I shamelessly incorporated into this document), and Barry Pedersen for helpful comments and miscellaneous useful info. ZLib compression and decompression by Jean-loup Gailly (compression) and Mark Adler (decompression). For more info, see the zlib Home Page at http://www.cdrom.com/pub/infozip/zlib/ The rest I figured out on my own by looking at the data and using a bit of common sense. Warning: This is intended for use by people that already know what they're doing. I'm a C programmer, so I'm doing things in C notation here, but I'll try to explain it so that those of you that don't speak C will be able to understand. If you don't understand, write me at joed@cws.org and I'll try to clear things up. I'm also a big believer in examples, so I'll be walking you through an HPI file as I explain. The first part of the file is a header. Except for the copyright statement at the end, this is the only unencrypted portion of the file. The header looks like this: typedef struct _HPIHeader { long HPIMarker; /* 'HAPI' */ long SaveMarker; /* 'BANK' if saved gamed */ long DirectorySize; /* The size of the directory */ long HeaderKey; /* Decrypt key */ long Start; /* File offset of directory */ } HPIHeader; Here's a hex dump of a sample header: 00000000 48 41 50 49 00 00 01 00 24 02 00 00 7D 00 00 00 HAPI....$...}... 00000010 14 00 00 00 Taken individually: HPIMarker This is just a marker. The value is always HAPI in ASCII. In hex, it's 0x49504148. SaveMarker If it's a saved game, the value is BANK in ASCII, or 0x4B4E4142 in hex. Save game files are something of a special case, and I haven't done much to try to decode those. The value in normal HPI files is 0x00010000, but I have no idea if this means anything. I just check for BANK, and ignore it otherwise. DirectorySize This is the size of the directory contained in the HPI file. Here, the value is 0x224, or 548 bytes. This includes the size of the header. HeaderKey The decryption key. Its value is 0x0000007D. More on this later. Start The offset in the file where the directory starts. I have yet to see one that didn't start immediately after the header at offset 0x14, but you never know. Now we know enough to read the directory. But first, a small implementation note. Instead of allocating a buffer of DirectorySize bytes and then reading the directory into it, allocate a buffer of DirectorySize bytes, and read DirectorySize-Start bytes into the buffer at position Start. This is because the directory contains pointers, but the pointers are relative to the start of the file, not the start of the directory. By moving the directory down Start bytes into the buffer, we simplify the program. If we didn't do this, we'd have to subtract Start from every offset, and that would be a royal pain. Now some of you are undoubtedly looking at an HPI file with a hex dump program, and saying "That sure doesn't look like a directory to me!" Well, you're right. That's because it's encrypted. To decrypt it, first calculate the decryption key from the HeaderKey variable: Key = NOT ((HeaderKey * 4) OR (HeaderKey >> 6)) Doing this on the 0x0000007D, you get FFFFFE0A (I think). Here is the C code for the decryption routine. Since everything in the file is encrypted, I found it easier to combine the read and decryption functions into one. int ReadAndDecrypt(int fpos, char *buff, int buffsize) /* Read "buffsize" bytes from the HPI file at position "fpos" into "buff", and then decrypt it. */ { int count; int tkey; int result; /* first, position the file */ fseek(HPIFile, fpos, SEEK_SET); /* read the data into buff */ result = fread(buff, 1, buffsize, HPIFile); /* for each character in buff... */ for (count = 0; count < buffsize; count++) { /* compute tkey = (fpos + count) XOR Key */ tkey = (fpos + count) ^ Key; /* and then decode the character: buff[count] = tkey XOR (NOT buff[count]) */ buff[count] = tkey ^ ~buff[count]; } /* result is the number of bytes actually read in, and should be equal to buffsize */ return result; } Note that the position of the byte in the file (fpos+count) is used to decrypt. And here is a decoded directory to make it easy to follow. Note that I loaded the actual directory starting at offset 0x14, so that the first 0x14 bytes are all zeros. See the implementation note above. All numbers here are 32-bit integers, ie "longs". 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010 00 00 00 00 08 00 00 00 1C 00 00 00 64 00 00 00 ............d... 00000020 6A 00 00 00 01 97 00 00 00 A0 00 00 00 01 C6 00 j............... 00000030 00 00 CF 00 00 00 01 13 01 00 00 1D 01 00 00 01 ................ 00000040 66 01 00 00 6E 01 00 00 01 94 01 00 00 9D 01 00 f...n........... 00000050 00 01 C3 01 00 00 C9 01 00 00 01 EF 01 00 00 F7 ................ 00000060 01 00 00 01 61 6E 69 6D 73 00 01 00 00 00 72 00 ....anims.....r. 00000070 00 00 7B 00 00 00 8E 00 00 00 00 61 72 6D 66 6C ..{........armfl 00000080 61 6B 5F 67 61 64 67 65 74 2E 67 61 66 00 24 02 ak_gadget.gaf.$. 00000090 00 00 D8 2D 00 00 01 64 6F 77 6E 6C 6F 61 64 00 ...-...download. 000000A0 01 00 00 00 A8 00 00 00 B1 00 00 00 BD 00 00 00 ................ 000000B0 00 41 52 4D 46 4C 41 4B 2E 54 44 46 00 61 28 00 .ARMFLAK.TDF.a(. 000000C0 00 01 01 00 00 01 66 65 61 74 75 72 65 73 00 01 ......features.. 000000D0 00 00 00 D7 00 00 00 E0 00 00 00 E8 00 00 00 01 ................ 000000E0 63 6F 72 70 73 65 73 00 01 00 00 00 F0 00 00 00 corpses......... 000000F0 F9 00 00 00 0A 01 00 00 00 61 72 6D 66 6C 61 6B .........armflak 00000100 5F 64 65 61 64 2E 74 64 66 00 E3 28 00 00 68 02 _dead.tdf..(..h. 00000110 00 00 01 6F 62 6A 65 63 74 73 33 64 00 02 00 00 ...objects3d.... 00000120 00 25 01 00 00 37 01 00 00 43 01 00 00 00 4C 01 .%...7...C....L. 00000130 00 00 5D 01 00 00 00 61 72 6D 66 6C 61 6B 2E 33 ..]....armflak.3 00000140 64 6F 00 39 2A 00 00 7B 14 00 00 01 61 72 6D 66 do.9*..{....armf 00000150 6C 61 6B 5F 64 65 61 64 2E 33 64 6F 00 C9 34 00 lak_dead.3do..4. 00000160 00 1A 11 00 00 01 73 63 72 69 70 74 73 00 01 00 ......scripts... 00000170 00 00 76 01 00 00 7F 01 00 00 8B 01 00 00 00 41 ..v............A 00000180 52 4D 46 4C 41 4B 2E 43 4F 42 00 67 3F 00 00 E4 RMFLAK.COB.g?... 00000190 09 00 00 01 75 6E 69 74 70 69 63 73 00 01 00 00 ....unitpics.... 000001A0 00 A5 01 00 00 AE 01 00 00 BA 01 00 00 00 41 52 ..............AR 000001B0 4D 46 4C 41 4B 2E 50 43 58 00 B4 42 00 00 91 25 MFLAK.PCX..B...% 000001C0 00 00 01 75 6E 69 74 73 00 01 00 00 00 D1 01 00 ...units........ 000001D0 00 DA 01 00 00 E6 01 00 00 00 41 52 4D 46 4C 41 ..........ARMFLA 000001E0 4B 2E 46 42 49 00 89 63 00 00 39 05 00 00 01 77 K.FBI..c..9....w 000001F0 65 61 70 6F 6E 73 00 01 00 00 00 FF 01 00 00 08 eapons.......... 00000200 02 00 00 1B 02 00 00 00 61 72 6D 66 6C 61 6B 5F ........armflak_ 00000210 77 65 61 70 6F 6E 2E 74 64 66 00 2D 67 00 00 42 weapon.tdf.-g..B 00000220 02 00 00 01 Let's get started... 00000010 00 00 00 00 08 00 00 00 1C 00 00 00 64 00 00 00 ............d... ^^^^^^^^^^^ ^^^^^^^^^^^ At offset 0x14, you see the number 0x8. This is the number of entries in the directory. Grabbing the next 32-bit number at offset 0x18, you get 0x1C. This is the offset of a list of directory entries. In this case, there are 8 entries in the list. The format of an entry is: typedef struct _HPIEntry { long NameOffset; /* points to the file name */ long DirDataOffset; /* points to directory data */ char Flag; /* file flag */ } HPIEntry; NameOffset Pointer to the file name. This is a 0-terminated string of varying length. DirDataOffset Pointer to the directory data for the file. The actual data varies depending on whether it's a file or a subdirectory. Flag If this is 1, the entry is a subdirectory. If it's 0, it's a file. Looking at offset 0x1C, we see: 00000010 64 00 00 00 ............d... 00000020 6A 00 00 00 01 97 00 00 00 A0 00 00 00 01 C6 00 j............... 00000030 00 00 CF 00 00 00 01 13 01 00 00 1D 01 00 00 01 ................ 00000040 66 01 00 00 6E 01 00 00 01 94 01 00 00 9D 01 00 f...n........... 00000050 00 01 C3 01 00 00 C9 01 00 00 01 EF 01 00 00 F7 ................ 00000060 01 00 00 01 The 8 entries are: 0x064, 0x06A, 1 0x097, 0x0A0, 1 0x0C6, 0x0CF, 1 0x113, 0x11D, 1 0x166, 0x16E, 1 0x194, 0x19D, 1 0x1C3, 0x1C9, 1 0x1EF, 0x1F7, 1 Let's look at the first entry. The Flag is 1, so it's a subdirectory. At offset 0x64, we see: 00000060 01 00 00 01 61 6E 69 6D 73 00 01 00 00 00 72 00 ....anims.....r. ^^^^^^^^^^^^^^^^^ or 'anims'. This is the name. Since this is a subdirectory, offset 0x6A contains the number of entries in the subdirectory, followed by a pointer to the first entry. This is exactly like the count/pointer pair at 0x14 that got us started. Think recursion. 00000060 01 00 00 01 61 6E 69 6D 73 00 01 00 00 00 72 00 ....anims.....r. ^^^^^^^^^^^ ^^^^^ 00000070 00 00 7B 00 00 00 8E 00 00 00 00 61 72 6D 66 6C ..{........armfl ^^^^^ The number at offset 0x6A is a 1, indicating that there's only 1 file in this subdirectory. 0x6E contains the offset of the first (and only) entry in the subdirectory, which is: 0x7B, 0x8E, 0 The 0 indicates that this is a file. Looking at offset 0x7B, we see: 00000070 00 00 7B 00 00 00 8E 00 00 00 00 61 72 6D 66 6C ..{........armfl ^^^^^^^^^^^^^^ 00000080 61 6B 5F 67 61 64 67 65 74 2E 67 61 66 00 24 02 ak_gadget.gaf.$. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ or 'armflak_gadget.gaf'. This is the name of the first (and only) file in the 'anims' subdirectory. Since this is a file, the data at offset 0x8E is a little different. There are 3 items here instead if one: typedef struct _HPIFileData { long DataOffset; /* starting offset of the file */ long FileSize; /* size of the decompressed file */ char Flag; /* file flag */ } HPIEntry; DataOffset This is the offset in the HPI file that this file starts at. FileSize This is the decompressed file size. When you extract the file, it should be this many bytes long. Flag If this is 1, the file is compressed with LZ77 compression. If it's 2, it's compressed with ZLIb compression. If it's 0, it's not compressed at all. This is the format used by the unit viewer. 00000080 61 6B 5F 67 61 64 67 65 74 2E 67 61 66 00 24 02 ak_gadget.gaf.$. ^^^^^ 00000090 00 00 D8 2D 00 00 01 64 6F 77 6E 6C 6F 61 64 00 ...-...download. ^^^^^ ^^^^^^^^^^^ ^^ Looking at offset 0x8E, we see that the three items are: 0x224, 0x2DD8, 1 If you recall, the directory size was 0x224 bytes. This says the file starts at the first offset after the directory, which makes sense and means we're interpreting things correctly. This also says that the extracted file should be 0x2DD8 (or 11,736) bytes long. At this point, we know enough to actually traverse the directory tree in the HPI file. Here's a recursive pseudocode function to do it. The initial call to it would be 'TraverseTree(".", Header.Start)'. TraverseTree(string ParentName, int offset) Entries = Directory[offset] EntryOffset = Directory[offset+4] for count = 1 to Entries NameOffset = Directory[EntryOffset] DataOffset = Directory[EntryOffset+4] Flag = Directory[EntryOffset+8] Name = ParentName+"\"+Directory[NameOffset] print "Processing ",Name if Flag = 1 TraverseTree(Name, DataOffset) <- recursion! else ProcessFile(Name, DataOffset) End If EntryOffset = EntryOffset + 9 Next Count If you code this up in your language of choice and run it, it should print something like this: (if you haven't guessed already, the file I'm using as an example is the "Arm Flakker" unit's aflakker.ufo file) .\anims .\anims\armflak_gadget.gaf .\download .\download\ARMFLAK.TDF .\features .\features\corpses .\features\corpses\armflak_dead.tdf .\objects3d .\objects3d\armflak.3do .\objects3d\armflak_dead.3do .\scripts .\scripts\ARMFLAK.COB .\unitpics .\unitpics\ARMFLAK.PCX .\units .\units\ARMFLAK.FBI .\weapons .\weapons\armflak_weapon.tdf At this point, I urge you to go look at that directory hex dump and traverse the thing by hand until it makes sense. I can hear you now. "What the heck is that 'ProcessFile' function?" It decodes the file. I'll explain in a bit. But first, here's a list of the various files in this HPI file, and their starting offsets. If you don't understand where I got the starting offsets, go reread the directory hex dump until you do. anims\armflak_gadget.gaf 0x0224 download\ARMFLAK.TDF 0x2861 features\corpses\armflak_dead.tdf 0x28E3 objects3d\armflak.3do 0x2A39 objects3d\armflak_dead.3do 0x34C9 scripts\ARMFLAK.COB 0x3F67 unitpics\ARMFLAK.PCX 0x42B4 units\ARMFLAK.FBI 0x6389 weapons\armflak_weapon.tdf 0x672D Because it's a short file, and because it decodes to readable text, I'm going to use the ARMFLAK.TDF file as the example. If the file was not compressed at all, then the file is just inserted into the HPI file as one big chunk. But if it is... This is where the -REAL- fun begins. I'm going to take it slow here because I'm still half figuring it out myself (writing this has actually made me realize some things that I hadn't before). When the file was compressed, it was broken up into chunks of 64K (65536) bytes each, plus one more chunk to hold anything left over. Each chunk was then compressed. Note that some chunks are larger when compressed than decompressed, which means that some compressed chunks can be larger than 64K. The total number of chunks in the file can be obtained by the following formula: chunks = Entry.FileSize / 65536 if (Entry.FileSize mod 65536) <> 0 chunks = chunks + 1 The offset in the directory points to a list of 32-bit numbers that are the compressed sizes of each compressed chunk of data. Following the list of sizes are the actual compressed chunks of data. In this HPI file, each file has only one chunk, but the totala1.hpi file contains some files with a dozen or so, and the hpi files on the CDs have files in them with over a hundred. Going to offset 0x2861, we read in a chunk o'data and decrypt it to find this: 00002860 7E 00 00 00 53 51 53 48 02 01 01 6B 00 00 00 .~...SQSH...k... 00002870 01 01 00 00 FE 36 00 00 20 5B 51 49 4E 55 3C 0E .....6.. [QINU<. 00002880 64 64 94 5D 49 5D 11 14 29 7B D5 26 18 55 6E 75 dd.]I]..){.&.Unu 00002890 64 54 34 41 79 6C 9C 71 81 83 8B 3B 59 49 CB 4D dT4Ayl.q...;YI.M 000028A0 43 D1 42 54 9A A5 A8 AA AF B0 B8 64 61 AC 6D B0 C.BT.......da.m. 000028B0 B1 82 72 34 79 B8 B0 BD DD A3 82 81 E8 86 AC 89 ..r4y........... 000028C0 98 92 C2 CF 98 EB 9D E0 56 BF A2 6F AB 5F A8 96 ........V..o._.. 000028D0 B5 C3 9F B8 EB B9 BE 7D 4F C7 5F CE 2F D1 4C D1 .......}O._./.L. 000028E0 D0 D2 90 The decompressed file size of ARMFLAK.TDF is 257 bytes. This tells us that there's only one chunk. The size of this chunk is 0x7E bytes. The chunk itself immediately follows. Each chunk looks like this: typedef struct _HPIChunk { long Marker; /* always 0x48535153 (SQSH) */ char Unknown1; char CompMethod; /* 1=LZ77, 2=ZLib */ char Encrypt; /* is the block encrypted? */ long CompressedSize; /* the length of the compressed data */ long DecompressedSize; /* the length of the decompressed data */ long Checksum; /* Checksum */ char data[]; /* 'CompressedSize' bytes of data */ } HPIChunk; Marker This is the start-of-chunk marker, and is always 0x48535153 (ASCII 'SQSH'). Unknown1 I know not what this is for. It's always 0x02. Maybe some sort of version number? CompMethod This is the compression method. It's 1 for LZ77, 2 for ZLib. Encrypt This tells whether the block is encrypted a second time. See below. CompressedSize This is the size of the compressed data in the chunk. 0x6B bytes. DecompressedSize This is the size of the decompressed data in the chunk. 0x101 bytes. Checksum This is a checksum of the data. It's merely the sum of all the bytes of data (treated as unsigned numbers) added together. data The actual compressed data in the chunk. CompressedSize (0x6B) bytes of it. Let's look at the data. 00002870 20 5B 51 49 4E 55 3C 0E .....6.. [QINU<. 00002880 64 64 94 5D 49 5D 11 14 29 7B D5 26 18 55 6E 75 dd.]I]..){.&.Unu 00002890 64 54 34 41 79 6C 9C 71 81 83 8B 3B 59 49 CB 4D dT4Ayl.q...;YI.M 000028A0 43 D1 42 54 9A A5 A8 AA AF B0 B8 64 61 AC 6D B0 C.BT.......da.m. 000028B0 B1 82 72 34 79 B8 B0 BD DD A3 82 81 E8 86 AC 89 ..r4y........... 000028C0 98 92 C2 CF 98 EB 9D E0 56 BF A2 6F AB 5F A8 96 ........V..o._.. 000028D0 B5 C3 9F B8 EB B9 BE 7D 4F C7 5F CE 2F D1 4C D1 .......}O._./.L. 000028E0 D0 D2 90 Doesn't look like much, does it. That's because (YOU GUESSED IT!) it's encrypted yet again! Note: the checksum is calculated BEFORE this decryption. The 'Encrypt' field in the HPIChunk header is set to 1 to indicate that this decryption needs to be done. To decrypt, do this (more pseudocode): for x = 0 to CompressedSize-1 data[x] = (data[x] - x) XOR x next x This gives us: 00002870 20 5B 4D 45 4E 55 30 00 [MENU0. 00002880 54 52 80 59 31 5D 0D 0A 09 7B D1 00 10 55 4E 49 TR.Y1]...{...UNI 00002890 54 22 00 3D 41 52 60 4D 41 43 4B 3B 11 01 83 01 T".=AR`MACK;.... 000028A0 33 81 32 02 42 55 54 54 4F 4E B4 02 19 42 01 4E 3.2.BUTTON...B.N 000028B0 41 70 02 C2 01 46 4C 41 DD 23 02 7D E0 04 20 05 Ap...FLA.#.}.. . 000028C0 18 00 32 CF 00 D3 01 DE 56 3F 02 4F 03 5F 04 68 ..2.....V?.O._.h 000028D0 05 33 1F 06 D3 01 3E 41 8F 07 9F 08 AF 09 80 0D .3....>A........ 000028E0 00 00 4C ..L Woohoo! Look! Readable word fragments! But remember, the chunk is still compressed. In this case, the block is compressed with LZ77, since CompMethod is 1. The compression algorithm is a very basic sliding window compression scheme from the LZ77 family using a 4095 byte history and matches from 2 to 17 bytes long. The first byte is kind of a "tag" byte which determines if the next eight pieces of data are literal bytes or history matches. Starting with the least-significant bit, this tag byte is scanned to figure out what to do. When the current bit is a zero, the next byte of the input is transfered directly to the output and added to end of the history buffer. When the current bit is a one, the next two bytes taken from the input are used as a offset/length pair. The upper 12 bits are the offset into the history buffer and the lower 4 bits are the length. If the offset is zero, the end of the input data has been reached and the decompressor simply exits. Since we can assume that there will be no matches with a length of zero or only one byte, any match is a mimimum of two bytes so we just add two to the length which extends our range from 0-15 to 2-17 bytes. The match is then copied from the history buffer to the output and also added onto the end of the history buffer to keep it in sync with the output. When all eight bits of the tag byte have been used, the mask is reset and the next tag byte is loaded. Here is some decompress code: int Decompress(char *out, char *in, int len) { /* Decompress buffer "in" of size "len" into buffer "out" (previously allocated) returns the number of decompressed bytes. */ int x; int outbufptr; int mask; int tag; int inptr; int outptr; int count; int done; char Window[4096]; int inbufptr; for (x = 0; x < len; x++) { in[x] = (in[x] - x) ^ x; } done = FALSE; inptr = 0; outptr = 0; outbufptr = 1; mask = 1; tag = in[inptr++]; while (!done) { if ((mask & tag) == 0) { out[outptr++] = in[inptr]; Window[outbufptr] = in[inptr]; outbufptr = (outbufptr + 1) & 0xFFF; inptr++; } else { count = *((unsigned short *) (in+inptr)); inptr += 2; inbufptr = count >> 4; if (inbufptr == 0) return outptr; else { count = (count & 0x0f) + 2; if (count >= 0) { for (x = 0; x < count; x++) { out[outptr++] = Window[inbufptr]; Window[outbufptr] = Window[inbufptr]; inbufptr = (inbufptr + 1) & 0xFFF; outbufptr = (outbufptr + 1) & 0xFFF; } } } } mask *= 2; if (mask & 0x0100) { mask = 1; tag = in[inptr++]; } } return outptr; } When fed the data, the routine spits out: 00000000 5B 4D 45 4E 55 45 4E 54 52 59 31 5D 0D 0A 09 7B [MENUENTRY1]...{ 00000010 0D 0A 09 55 4E 49 54 4D 45 4E 55 3D 41 52 4D 41 ...UNITMENU=ARMA 00000020 43 4B 3B 0D 0A 09 4D 45 4E 55 3D 33 3B 0D 0A 09 CK;...MENU=3;... 00000030 42 55 54 54 4F 4E 3D 33 3B 0D 0A 09 55 4E 49 54 BUTTON=3;...UNIT 00000040 4E 41 4D 45 3D 41 52 4D 46 4C 41 4B 3B 0D 0A 09 NAME=ARMFLAK;... 00000050 7D 0D 0A 0D 0A 5B 4D 45 4E 55 45 4E 54 52 59 32 }....[MENUENTRY2 00000060 5D 0D 0A 09 7B 0D 0A 09 55 4E 49 54 4D 45 4E 55 ]...{...UNITMENU 00000070 3D 41 52 4D 41 43 56 3B 0D 0A 09 4D 45 4E 55 3D =ARMACV;...MENU= 00000080 33 3B 0D 0A 09 42 55 54 54 4F 4E 3D 33 3B 0D 0A 3;...BUTTON=3;.. 00000090 09 55 4E 49 54 4E 41 4D 45 3D 41 52 4D 46 4C 41 .UNITNAME=ARMFLA 000000A0 4B 3B 0D 0A 09 7D 0D 0A 0D 0A 5B 4D 45 4E 55 45 K;...}....[MENUE 000000B0 4E 54 52 59 33 5D 0D 0A 09 7B 0D 0A 09 55 4E 49 NTRY3]...{...UNI 000000C0 54 4D 45 4E 55 3D 41 52 4D 41 43 41 3B 0D 0A 09 TMENU=ARMACA;... 000000D0 4D 45 4E 55 3D 33 3B 0D 0A 09 42 55 54 54 4F 4E MENU=3;...BUTTON 000000E0 3D 33 3B 0D 0A 09 55 4E 49 54 4E 41 4D 45 3D 41 =3;...UNITNAME=A 000000F0 52 4D 46 4C 41 4B 3B 0D 0A 09 7D 0D 0A 0D 0A 0D RMFLAK;...}..... 00000100 0A . Yay! Clear decoded text. Write this chunk out, and go get the next one. When there are no more chunks, close the file, and go process the next one. To recompress, do something like the following: WHILE look ahead buffer is not empty find a match in the window to previously output data IF match length > minimum match length output reference pair move the window match length to the right ELSE output window first data item move the window one to the right ENDIF END If CompMethod is 2, use ZLib compression to decompress the block. You can get the zlib source code from the zlib home page at http://www.cdrom.com/pub/infozip/zlib/ From here, you've got enough data to proceed on your own. Good luck! Like I said, if you have any questions, let me know. Joe D joed@cws.org