diff options
author | Kim Kibum <kb0929.kim@samsung.com> | 2012-08-24 14:34:26 +0900 |
---|---|---|
committer | Kim Kibum <kb0929.kim@samsung.com> | 2012-08-24 14:34:26 +0900 |
commit | 4acc22dd2f30f063c7b07fdbc911384feeda58eb (patch) | |
tree | a78b9ba604297cddb9a23424ccc8abfc6ae7ba94 /proginfo | |
parent | 3101b2a7be8f0e3cc6ff469ce2597945c862264b (diff) | |
download | zip-4acc22dd2f30f063c7b07fdbc911384feeda58eb.tar.gz zip-4acc22dd2f30f063c7b07fdbc911384feeda58eb.tar.bz2 zip-4acc22dd2f30f063c7b07fdbc911384feeda58eb.zip |
upload source
Diffstat (limited to 'proginfo')
-rw-r--r-- | proginfo/3rdparty.bug | 114 | ||||
-rw-r--r-- | proginfo/ZipPorts | 285 | ||||
-rw-r--r-- | proginfo/algorith.txt | 68 | ||||
-rw-r--r-- | proginfo/extra.fld | 1441 | ||||
-rw-r--r-- | proginfo/fileinfo.cms | 231 | ||||
-rw-r--r-- | proginfo/infozip.who | 232 | ||||
-rw-r--r-- | proginfo/nt.sd | 111 | ||||
-rw-r--r-- | proginfo/perform.dos | 183 | ||||
-rw-r--r-- | proginfo/txtvsbin.txt | 112 | ||||
-rw-r--r-- | proginfo/ziplimit.txt | 218 |
10 files changed, 2995 insertions, 0 deletions
diff --git a/proginfo/3rdparty.bug b/proginfo/3rdparty.bug new file mode 100644 index 0000000..32e7823 --- /dev/null +++ b/proginfo/3rdparty.bug @@ -0,0 +1,114 @@ +Known, current PKZIP bugs/limitations: +------------------------------------- + + - PKUNZIP 2.04g is reported to corrupt some files when compressing them with + the -ex option; when tested, the files fail the CRC check, and comparison + with the original file shows bogus data (6K in one case) embedded in the + middle. PKWARE apparently characterized this as a "known problem." + + - PKUNZIP 2.04g considers volume labels valid only if originated on a FAT + file system, but other OSes and file systems (e.g., Amiga and OS/2 HPFS) + support volume labels, too. + + - PKUNZIP 2.04g can restore volume labels created by Zip 2.x but not by + PKZIP 2.04g (OS/2 DOS box only??). + + - PKUNZIP 2.04g gives an error message for stored directory entries created + under other OSes (although it creates the directory anyway), and PKZIP -vt + does not report the directory attribute bit as being set, even if it is. + + - PKZIP 2.04g mangles unknown extra fields (especially OS/2 extended attri- + butes) when adding new files to an existing zipfile [example: Walnut Creek + Hobbes March 1995 CD-ROM, FILE_ID.DIZ additions]. + + - PKUNZIP 2.04g is unable to detect or deal with prepended junk in a zipfile, + reporting CRC errors in valid compressed data. + + - PKUNZIP 2.04g (registered version) incorrectly updates/freshens the AV extra + field in authenticated archives. The resultant extra block length and total + extra field length are inconsistent. + + - [Windows version 2.01] Win95 long filenames (VFAT) are stored OK, but the + file system is always listed as ordinary DOS FAT. + + - [Windows version 2.50] NT long filenames (NTFS) are stored OK, but the + file system is always listed as ordinary DOS FAT. + + - PKZIP 2.04 for DOS encrypts using the OEM code page for 8-bit passwords, + while PKZIP 2.50 for Windows uses Latin-1 (ISO 8859-1). This means an + archive encrypted with an 8-bit password with one of the two PKZIP versions + cannot be decrypted with the other version. + + - PKZIP for Windows GUI (v 2.60), PKZIP for Windows command line (v 2.50) and + PKZIP for Unix (v 2.51) save the host's native file timestamps, but + only in a local extra field. Thus, timestamp-related selections (update + or freshen, both in extraction or archiving operations) use the DOS-format + localtime records in the Zip archives for comparisons. This may result + in wrong decisions of the program when updating archives that were + previously created in a different local time zone. + + - PKZIP releases newer than PKZIP for DOS 2.04g (PKZIP for Windows, both + GUI v 2.60 and console v 2.50; PKZIP for Unix v 2.51; probably others too) + use different code pages for storing filenames in central (OEM Codepage) + and local (ANSI / ISO 8859-1 Codepage) headers. When a stored filename + contains extended-ASCII characters, the local and central filename fields + do not match. As a consequence, Info-ZIP's Zip program considers such + archives as being corrupt and does not allow to modify them. Beginning + with release 5.41, Info-ZIP's UnZip contains a workaround to list AND + extract such archives with the correct filenames. + Maybe PKWARE has implemented this "feature" to allow extraction of their + "made-by-PKZIP for Unix/Windows" archives using old (v5.2 and earlier) + versions of Info-ZIP's UnZip for Unix/WinNT ??? (UnZip versions before + v 5.3 assumed that all archive entries were encoded in the codepage of + the UnZip program's host system.) + + - PKUNZIP 2.04g is reported to have problems with archives created on and/or + copied from Iomega ZIP drives (irony, eh?). + +Known, current WinZip bugs/limitations: +-------------------------------------- + + - [16-bit version 6.1a] NT short filenames (FAT) are stored OK, but the + file system is always listed as NTFS. + + - WinZip doesn't allow 8-bit passwords, which means it cannot decrypt an + archive created with an 8-bit password (by PKZIP or Info-ZIP's Zip). + + - WinZip (at least Versions 6.3 PL1, 7.0 SR1) fails to remove old extra + fields when freshening existing archive entries. When updating archives + created by Info-ZIP's Zip that contain UT time stamp extra field blocks, + UnZip cannot display or restore the updated (DOS) time stamps of the + freshened archive members. + +Known, current other third-party Zip utils bugs/limitations: +------------------------------------------------------------ + + - Asi's PKZip clones for Macintosh (versions 2.3 and 2.10d) are thoroughly + broken. They create invalid Zip archives! + a) For the first entry, both compressed size and uncompressed length + are recorded as 0, despite the fact that compressed data of non-zero + length has been added. + b) Their program creates extra fields with an (undocumented) internal + structure that violates the requirements of PKWARE's Zip format + specification document "appnote.txt": Their extra field seems to + contain pure data; the 4-byte block header consisting of block ID + and data length is missing. + +Possibly current PKZIP bugs: +--------------------------- + + - PKZIP (2.04g?) can silently ignore read errors on network drives, storing + the correct CRC and compressed length but an incorrect and inconsistent + uncompressed length. + + - PKZIP (2.04g?), when deleting files from within a zipfile on a Novell + drive, sometimes only zeros out the data while failing to shrink the + zipfile. + +Other limitations: +----------------- + + - PKZIP 1.x and 2.x encryption has been cracked (known-plaintext approach; + see http://www.cryptography.com/ for details). + +[many other bugs in PKZIP 1.0, 1.1, 1.93a, 2.04c and 2.04e] diff --git a/proginfo/ZipPorts b/proginfo/ZipPorts new file mode 100644 index 0000000..2d946d3 --- /dev/null +++ b/proginfo/ZipPorts @@ -0,0 +1,285 @@ +__________________________________________________________________________ + + This is the Info-ZIP file ZipPorts, last updated on 17 February 1996. +__________________________________________________________________________ + + +This document defines a set of rules and guidelines for those who wish to +contribute patches to Zip and UnZip (or even entire ports to new operating +systems). The list below is something between a style sheet and a "Miss +Manners" etiquette guide. While Info-ZIP encourages contributions and +fixes from anyone who finds something worth changing, we are also aware +of the fact that no two programmers have the programming style and that +unrestrained changes by a few dozen contributors would result in hideously +ugly (and unmaintainable) Frankenstein code. So consider the following an +attempt by the maintainers to maintain sanity as well as useful code. + +(The first version of this document was called either "ZipRules" or the +"No Feelthy ..." file and was compiled by David Kirschbaum in consulta- +tion with Mark Adler, Cave McNewt and others. The current incarnation +expands upon the original with insights gained from a few more years of +happy hacking...) + + +Summary: + + (0) The Platinum Rule: DON'T BREAK EXISTING PORTS +(0.1) The Golden Rule: DO UNTO THE CODE AS OTHERS HAVE DONE BEFORE +(0.2) The Silver Rule: DO UNTO THE LATEST BETA CODE +(0.3) The Bronze Rule: NO FEELTHY PIGGYBACKS + + (1) NO FEELTHY TABS + (2) NO FEELTHY CARRIAGE RETURNS + (3) NO FEELTHY 8-BIT CHARS + (4) NO FEELTHY LEFT-JUSTIFIED DASHES + (5) NO FEELTHY FANCY_FILENAMES + (6) NO FEELTHY NON-ZIPFILES AND NO FEELTHY E-MAIL BETAS + (7) NO FEELTHY E-MAIL BINARIES + + +Explanations: + + (0) The Platinum Rule: DON'T BREAK EXISTING PORTS + + No doubt about it, this is the one which really pisses us off and + pretty much guarantees that your port or patch will be ignored and/ + or laughed at. Examples range from the *really* severe cases which + "port" by ripping out all of the existing multi-OS code, to more + subtle oopers like relying on a local capability which doesn't exist + on other OSes or in older compilers (e.g., the use of ANSI "#elif" + or "#pragma" or "##" constructs, C++ comments, GNU extensions, etc.). + As to the former, use #ifdefs for your new code (see rule 0.3). And + as to the latter, trust us--there are few things we'd like better + than to be able to use some of the elegant "new" features out there + (many of which have been around for a decade or more). But our code + still compiles on machines dating back even longer, at least in spirit + --e.g., the AT&T 3B1 family and Dynix/ptx. Until we say otherwise, + dinosaurs are supported. + + +(0.1) The Golden Rule: DO UNTO THE CODE AS OTHERS HAVE DONE BEFORE + + In other words, try to fit into the local style of programming--no + matter how painful it may be. This includes cosmetic aspects like + indenting the same amount (both in the main C code and in the in- + clude files), using braces and comments similarly, NO TABS (see rule + #1), etc.; but also more substantive things like (for UnZip) putting + character strings into static (far) variables and using the LoadFar- + String macros to avoid overflowing limited MS-DOS data segments, and + using the ugly Info() macro instead of the more usual *printf() + functions so that dynamic-link-library ports are simpler. NEVER put + single-OS code (e.g., OS/2) of more than two or three lines into the + main (generic) modules; those are shared by everybody, and nobody else + cares about it or wants to see it. + + Note that not only do Zip and UnZip differ in these respects, so do + individual parts of each program. While it would be nice to have + global consistency, cosmetic changes are not a high priority; for + now we'll settle for local consistency--i.e., don't make things any + worse than they already are. + + Exception (BIG exception): single-letter variable names. Despite + the prevailing practice in much of Zip and parts of UnZip, and de- + spite the fact that one-letter variables allow you to pack really + cool, compact and complicated expressions onto one line, they also + make the code very difficult to maintain and are therefore *strongly* + discouraged. Don't ask us who is responsible in the first place; + while this sort of brain damage is not uncommon among former BASIC + programmers, it is nevertheless a lifelong embarrassment, and we do + try to pity the poor sod (that is, when we're not chasing bugs and + cursing him). :-) + + +(0.2) The Silver Rule: DO UNTO THE LATEST BETA CODE + + Few things are as annoying as receiving a large patch which obviously + represents a lot of time and careful work but which is relative to + an old version of Info-ZIP code. As wonderful as Larry Wall's patch + program is at applying context diffs to modified code, we regularly + make near-global changes and/or reorganize big chunks of the sources + (particularly in UnZip), and "patch" can't work miracles--big changes + invariably break any patch which is relative to an old version of the + code. + + Bottom line: contact the Info-ZIP core team FIRST (via the zip-bugs + e-mail address) and get up to date with the latest code before begin- + ning a big new port. And try to *stay* up to date while working on + your port--at least, as much as possible. + + +(0.3) The Bronze Rule: NO FEELTHY PIGGYBACKS + + UnZip is currently ported to something like 12 operating systems + (a few more or less depending on how one counts), and each of these, + with the possible exception of VM/CMS, has a unique macro identifying + it: AMIGA, ATARI_ST, __human68k__, MACOS, MSDOS, MVS, OS2, TOPS20, + UNIX, VMS, WIN32. Zip is moving in the same direction. New ports + should NOT piggyback one of the existing ports unless they are sub- + stantially similar--for example, Minix and Coherent are basically Unix + and therefore are included in the UNIX macro, but DOS djgpp ports and + OS/2 emx ports (both of which use the Unix-originated GNU C compiler + and often have "unix" defined by default) are obviously *not* Unix. + [The existing MTS port is a special exception; basically only one per- + son knows what MTS really is, and he's not telling. Presumably it's + not very close to Unix, but it's not worth arguing about it now.] + Along the same lines, neither OS/2 nor Human68K is the same as (or + even close to) MS-DOS. MVS and VM/CMS, on the other hand, are quite + similar to each other and are therefore combined in most places. + + Bottom line: when adding a new port (e.g., QDOS), create a new macro + for it ("QDOS"), a new subdirectory ("qdos") and a new source file for + OS-specific code ("qdos/qdos.c"). Use #ifdefs to fit any OS-specific + changes into the existing code (e.g., unzpriv.h). If it's close enough + to an existing port that piggybacking is a temptation, define a new + "combination macro" (e.g., "CMS_MVS") and replace the old macros as + required. (This last applies to UnZip, at least; the old preference + in Zip was fewer macros and long #ifdef lines, so talk to Onno or Jean- + loup about that.) See also rule 0.1. + + (Note that, for UnZip, new ports need not attempt to deal with all + features. Among other things, the wildcard-zipfile code in do_wild() + may be replaced with a supplied dummy version, since opendir/readdir/ + closedir() or the equivalent can be difficult to implement.) + + + (1) NO FEELTHY TABS + + Some editors and e-mail systems either have no capability to use + and/or display tab characters (ASCII 9) correctly, or they use non- + standard or variable-width tab columns, or other horrors. Some edi- + tors auto-convert spaces to tabs, after which the blind use of "diff + -c" results in a huge and mostly useless patch. Yes, *we* know about + diff's "-b" option, but not everyone does. And yes, we also know this + makes the source files bigger, even after compression; so be it. If + we *really* cared that much about the size of the sources, we'd still + be writing Unix-only utilities. + + Bottom line: use spaces, not tabs. + + Exception: some of the makefiles (the Unix one in particular) require + tabs as part of the syntax. + + Related utility programs: + Unix, OS/2 and MS-DOS: expand, unexpand. + MS-DOS: Buerg's TABS; Toad Hall's TOADSOFT. + And some editors have the conversion built-in. + + + (2) NO FEELTHY CARRIAGE RETURNS + + All source, documentation and other text files shall have Unix style + line endings (LF only, a.k.a. ctrl-J), not the DOS/OS2/NT CR+LF or Mac + CR-only line endings. + + Reason: "real programmers" in any environment can convert back and + forth between Unix and DOS/Mac style. All PC compilers but a few old + Borland versions can use either Unix or MS-DOS end-of-lines. Buerg's + LIST (file-display utility) for MS-DOS can use Unix or MS-DOS EOLs. + Both Zip and UnZip can convert line-endings as appropriate. But Unix + utilities like diff and patch die a horrible death (or produce horrible + output) if the target files have CRs. + + Related utilities: flip for Unix, OS/2 and MS-DOS; Unix "tr". + + Exceptions: documentation in pre-compiled binary distributions should + be in the local (target) format. + + + (3) NO FEELTHY 8-BIT CHARS + + Do all your editing in a plain-text ASCII editor. No WordPerfect, MS + Word, WordStar document mode, or other word processor files, thenkyew. + No desktop publishing. *Especially* no EBCDIC. No TIFFs, no GIFs, no + embedded pictures or dancing ladies (too bad, Cave Newt). [Sigh... -CN] + + Reason: compatibility with different consoles. My old XT clone is + the most limited! + + Exceptions: some Macintosh makefiles apparently require some 8-bit + characters; the Human68k port uses 8-bit characters for Kanji or Kana + comments (I think); etc. + + Related utilities: vi, emacs, EDLIN, Turbo C editor, other programmers' + editors, various word processor -> text conversion utilities. + + + (4) NO FEELTHY LEFT-JUSTIFIED DASHES + + Always precede repeated dashes (------) with one or more leading non- + dash characters: spaces, tabs, pound signs (#), comments (/*), what- + ever. + + Reason: sooner or later your source file will be e-mailed through an + undigestifier utility, most of which treat leading dashes as end-of- + message separators. We'd rather not have your code broken up into a + dozen separate untitled messages, thank you. + + + (5) NO FEELTHY FANCY_FILENAMES + + Assume the worst: that someone on a brain-damaged DOS system has to + work with everything your magic fingers produced. Keep the filenames + unimaginative and within MS-DOS limits (i.e., ordinary A..Z, 1..9, + "-$_!"-type characters, in the 8.3 "filename.ext" format). Mac and + Unix users, giggle all you want, but no spaces or multiple dots. + + Reason: compatibility with different file systems. MS-DOS FAT is the + most limited, with the exception of CompuServe (6.3, argh). + + Exceptions: slightly longer names are occasionally acceptable within + OS-specific subdirectories, but don't do that unless there's a good + reason for it. + + + (6) NO FEELTHY NON-ZIPFILES AND NO FEELTHY E-MAIL BETAS + + Beta testers and developers are in general expected to have both + ftp capability and the ability to deal with zipfiles. Those without + should either find a friend who does or else learn about ftp-mailers. + + Reason: the core development team barely has time to work on the + code, much less prepare oddball formats and/or mail betas out (and + the situation is getting worse, sigh). + + Exceptions: anyone seriously proposing to do a new port will be + given special treatment, particularly with respect to UnZip; we + obviously realize that bootstrapping a completely new port can be + quite difficult and have no desire to make it even harder due to + lack of access to the latest code (rule 0.2). + + Public releases of UnZip, on the other hand, will be available in + two formats: .tar.Z (16-bit compress'd tar) and .zip (either "plain" + or self-extracting). Zip sources and executables will generally only + be distributed in .zip format, since Zip is pretty much useless without + UnZip. + + + (7) NO FEELTHY E-MAIL BINARIES + + Binary files (e.g., executables, test zipfiles, etc.) should NEVER + be mailed raw. Where possible, they should be uploaded via ftp in + BINARY mode; if that's impossible, Mark's "ship" ASCII-encoder should + be used; and if that's unavailable, uuencode or xxencode should be + used. Weirdo NeXTmail, mailtool and MIME formats are also Right Out. + + Files larger than 50KB may need to be broken into pieces for mailing + (be sure to label them in order!), unless "ship" is used (it can + auto-split, label and mail files if told to do so). If Down Under + is involved, files must be broken into under-20KB chunks. + + Reasons: to prevent sounds of gagging mailers from resounding through- + out the land. To be relatively efficient in the binary->ASCII conver- + sion. (Yeah, yeah, I know, there's better conversions out there. But + not as widely known, and they often break on BITNET gateways.) + + Related utilities: ship, uuencode, uudecode, uuxfer20, quux, others. + Just make sure they don't leave embedded or trailing spaces (that is, + they should use the "`" character in place of ASCII 32). Otherwise + mailers are prone to truncate or whatever. + + +Greg Roelofs (a.k.a. Cave Newt) +Info-ZIP UnZip maintainer + +David Kirschbaum +former Info-ZIP Coordinator diff --git a/proginfo/algorith.txt b/proginfo/algorith.txt new file mode 100644 index 0000000..867e30b --- /dev/null +++ b/proginfo/algorith.txt @@ -0,0 +1,68 @@ +Zip's deflation algorithm is a variation of LZ77 (Lempel-Ziv 1977, see +reference below). It finds duplicated strings in the input data. The +second occurrence of a string is replaced by a pointer to the previous +string, in the form of a pair (distance, length). Distances are +limited to 32K bytes, and lengths are limited to 258 bytes. When a +string does not occur anywhere in the previous 32K bytes, it is +emitted as a sequence of literal bytes. (In this description, +'string' must be taken as an arbitrary sequence of bytes, and is not +restricted to printable characters.) + +Literals or match lengths are compressed with one Huffman tree, and +match distances are compressed with another tree. The trees are stored +in a compact form at the start of each block. The blocks can have any +size (except that the compressed data for one block must fit in +available memory). A block is terminated when zip determines that it +would be useful to start another block with fresh trees. (This is +somewhat similar to compress.) + +Duplicated strings are found using a hash table. All input strings of +length 3 are inserted in the hash table. A hash index is computed for +the next 3 bytes. If the hash chain for this index is not empty, all +strings in the chain are compared with the current input string, and +the longest match is selected. + +The hash chains are searched starting with the most recent strings, to +favor small distances and thus take advantage of the Huffman encoding. +The hash chains are singly linked. There are no deletions from the +hash chains, the algorithm simply discards matches that are too old. + +To avoid a worst-case situation, very long hash chains are arbitrarily +truncated at a certain length, determined by a runtime option (zip -1 +to -9). So zip does not always find the longest possible match but +generally finds a match which is long enough. + +zip also defers the selection of matches with a lazy evaluation +mechanism. After a match of length N has been found, zip searches for a +longer match at the next input byte. If a longer match is found, the +previous match is truncated to a length of one (thus producing a single +literal byte) and the longer match is emitted afterwards. Otherwise, +the original match is kept, and the next match search is attempted only +N steps later. + +The lazy match evaluation is also subject to a runtime parameter. If +the current match is long enough, zip reduces the search for a longer +match, thus speeding up the whole process. If compression ratio is more +important than speed, zip attempts a complete second search even if +the first match is already long enough. + +The lazy match evaluation is not performed for the fastest compression +modes (speed options -1 to -3). For these fast modes, new strings +are inserted in the hash table only when no match was found, or +when the match is not too long. This degrades the compression ratio +but saves time since there are both fewer insertions and fewer searches. + +Jean-loup Gailly +jloup@chorus.fr + +References: + +[LZ77] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data +Compression", IEEE Transactions on Information Theory", Vol. 23, No. 3, +pp. 337-343. + +APPNOTE.TXT documentation file in PKZIP 1.93a. It is available by +ftp in ftp.cso.uiuc.edu:/pc/exec-pc/pkz193a.exe [128.174.5.59] + +'Deflate' Compressed Data Format Specification: +ftp://ftp.uu.net/pub/archiving/zip/doc/deflate-1.1.doc diff --git a/proginfo/extra.fld b/proginfo/extra.fld new file mode 100644 index 0000000..769fef1 --- /dev/null +++ b/proginfo/extra.fld @@ -0,0 +1,1441 @@ +The following are the known types of zipfile extra fields as of this +writing. Extra fields are documented in PKWARE's appnote.txt and are +intended to allow for backward- and forward-compatible extensions to +the zipfile format. Multiple extra-field types may be chained together, +provided that the total length of all extra-field data is less than 64KB. +(In fact, PKWARE requires that the total length of the entire file header, +including timestamp, file attributes, filename, comment, extra field, etc., +be no more than 64KB.) + +Each extra-field type (or subblock) must contain a four-byte header con- +sisting of a two-byte header ID and a two-byte length (little-endian) for +the remaining data in the subblock. If there are additional subblocks +within the extra field, the header for each one will appear immediately +following the data for the previous subblock (i.e., with no padding for +alignment). + +All integer fields in the descriptions below are in little-endian (Intel) +format unless otherwise specified. Note that "Short" means two bytes, +"Long" means four bytes, and "Long-Long" means eight bytes, regardless +of their native sizes. Unless specifically noted, all integer fields should +be interpreted as unsigned (non-negative) numbers. + +Christian Spieler, 20040507 + + ------------------------- + + Header ID's of 0 thru 31 are reserved for use by PKWARE. + The remaining ID's can be used by third party vendors for + proprietary usage. + + The current Header ID mappings defined by PKWARE are: + + 0x0001 ZIP64 extended information extra field + 0x0007 AV Info + 0x0008 Reserved for future Unicode file name data (PFS) + 0x0009 OS/2 extended attributes (also Info-ZIP) + 0x000a NTFS (Win9x/WinNT FileTimes) + 0x000c OpenVMS (also Info-ZIP) + 0x000d Unix + 0x000e Reserved for file stream and fork descriptors + 0x000f Patch Descriptor + 0x0014 PKCS#7 Store for X.509 Certificates + 0x0015 X.509 Certificate ID and Signature for + individual file + 0x0016 X.509 Certificate ID for Central Directory + 0x0017 Strong Encryption Header + 0x0018 Record Management Controls + 0x0019 PKCS#7 Encryption Recipient Certificate List + 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes + - uncompressed + 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400) + attributes - compressed + + The Header ID mappings defined by Info-ZIP and third parties are: + + 0x07c8 Info-ZIP Macintosh (old, J. Lee) + 0x2605 ZipIt Macintosh (first version) + 0x2705 ZipIt Macintosh v 1.3.5 and newer (w/o full filename) + 0x2805 ZipIt Macintosh 1.3.5+ + 0x334d Info-ZIP Macintosh (new, D. Haase's 'Mac3' field) + 0x4154 Tandem NSK + 0x4341 Acorn/SparkFS (David Pilling) + 0x4453 Windows NT security descriptor (binary ACL) + 0x4704 VM/CMS + 0x470f MVS + 0x4854 Theos, old inofficial port + 0x4b46 FWKCS MD5 (see below) + 0x4c41 OS/2 access control list (text ACL) + 0x4d49 Info-ZIP OpenVMS (obsolete) + 0x4d63 Macintosh SmartZIP, by Macro Bambini + 0x4f4c Xceed original location extra field + 0x5356 AOS/VS (binary ACL) + 0x5455 extended timestamp + 0x554e Xceed unicode extra field + 0x5855 Info-ZIP Unix (original; also OS/2, NT, etc.) + 0x6542 BeOS (BeBox, PowerMac, etc.) + 0x6854 Theos + 0x7441 AtheOS (AtheOS/Syllable attributes) + 0x756e ASi Unix + 0x7855 Info-ZIP Unix (new) + 0xfb4a SMS/QDOS + +The following are detailed descriptions of the known extra-field block types: + + -ZIP64 Extended Information Extra Field (0x0001): + =============================================== + + The following is the layout of the ZIP64 extended + information "extra" block. If one of the size or + offset fields in the Local or Central directory + record is too small to hold the required data, + a ZIP64 extended information record is created. + The order of the fields in the ZIP64 extended + information record is fixed, but the fields will + only appear if the corresponding Local or Central + directory record field is set to 0xFFFF or 0xFFFFFFFF. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (ZIP64) 0x0001 2 bytes Tag for this "extra" block type + Size 2 bytes Size of this "extra" block + Original + Size 8 bytes Original uncompressed file size + Compressed + Size 8 bytes Size of compressed data + Relative Header + Offset 8 bytes Offset of local header record + Disk Start + Number 4 bytes Number of the disk on which + this file starts + + This entry in the Local header must include BOTH original + and compressed file sizes. + + + -OS/2 Extended Attributes Extra Field (0x0009): + ============================================= + + The following is the layout of the OS/2 extended attributes "extra" + block. (Last Revision 19960922) + + Note: all fields stored in Intel low-byte/high-byte order. + + Local-header version: + + Value Size Description + ----- ---- ----------- + (OS/2) 0x0009 Short tag for this extra block type + TSize Short total data size for this block + BSize Long uncompressed EA data size + CType Short compression type + EACRC Long CRC value for uncompressed EA data + (var.) variable compressed EA data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (OS/2) 0x0009 Short tag for this extra block type + TSize Short total data size for this block (4) + BSize Long size of uncompressed local EA data + + The value of CType is interpreted according to the "compression + method" section above; i.e., 0 for stored, 8 for deflated, etc. + + The OS/2 extended attribute structure (FEA2LIST) is + compressed and then stored in its entirety within this + structure. There will only ever be one "block" of data in + the variable-length field. + + + -OS/2 Access Control List Extra Field: + ==================================== + + The following is the layout of the OS/2 ACL extra block. + (Last Revision 19960922) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (ACL) 0x4c41 Short tag for this extra block type ("AL") + TSize Short total data size for this block + BSize Long uncompressed ACL data size + CType Short compression type + EACRC Long CRC value for uncompressed ACL data + (var.) variable compressed ACL data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (ACL) 0x4c41 Short tag for this extra block type ("AL") + TSize Short total data size for this block (4) + BSize Long size of uncompressed local ACL data + + The value of CType is interpreted according to the "compression + method" section above; i.e., 0 for stored, 8 for deflated, etc. + + The uncompressed ACL data consist of a text header of the form + "ACL1:%hX,%hd\n", where the first field is the OS/2 ACCINFO acc_attr + member and the second is acc_count, followed by acc_count strings + of the form "%s,%hx\n", where the first field is acl_ugname (user + group name) and the second acl_access. This block type will be + extended for other operating systems as needed. + + + -Windows NT Security Descriptor Extra Field (0x4453): + =================================================== + + The following is the layout of the NT Security Descriptor (another + type of ACL) extra block. (Last Revision 19960922) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (SD) 0x4453 Short tag for this extra block type ("SD") + TSize Short total data size for this block + BSize Long uncompressed SD data size + Version Byte version of uncompressed SD data format + CType Short compression type + EACRC Long CRC value for uncompressed SD data + (var.) variable compressed SD data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (SD) 0x4453 Short tag for this extra block type ("SD") + TSize Short total data size for this block (4) + BSize Long size of uncompressed local SD data + + The value of CType is interpreted according to the "compression + method" section above; i.e., 0 for stored, 8 for deflated, etc. + Version specifies how the compressed data are to be interpreted + and allows for future expansion of this extra field type. Currently + only version 0 is defined. + + For version 0, the compressed data are to be interpreted as a single + valid Windows NT SECURITY_DESCRIPTOR data structure, in self-relative + format. + + + -PKWARE Win95/WinNT Extra Field (0x000a): + ======================================= + + The following description covers PKWARE's "NTFS" attributes + "extra" block, introduced with the release of PKZIP 2.50 for + Windows. (Last Revision 20001118) + + (Note: At this time the Mtime, Atime and Ctime values may + be used on any WIN32 system.) + [Info-ZIP note: In the current implementations, this field has + a fixed total data size of 32 bytes and is only stored as local + extra field.] + + Value Size Description + ----- ---- ----------- + (NTFS) 0x000a Short Tag for this "extra" block type + TSize Short Total Data Size for this block + Reserved Long for future use + Tag1 Short NTFS attribute tag value #1 + Size1 Short Size of attribute #1, in bytes + (var.) SubSize1 Attribute #1 data + . + . + . + TagN Short NTFS attribute tag value #N + SizeN Short Size of attribute #N, in bytes + (var.) SubSizeN Attribute #N data + + For NTFS, values for Tag1 through TagN are as follows: + (currently only one set of attributes is defined for NTFS) + + Tag Size Description + ----- ---- ----------- + 0x0001 2 bytes Tag for attribute #1 + Size1 2 bytes Size of attribute #1, in bytes (24) + Mtime 8 bytes 64-bit NTFS file last modification time + Atime 8 bytes 64-bit NTFS file last access time + Ctime 8 bytes 64-bit NTFS file creation time + + The total length for this block is 28 bytes, resulting in a + fixed size value of 32 for the TSize field of the NTFS block. + + The NTFS filetimes are 64-bit unsigned integers, stored in Intel + (least significant byte first) byte order. They determine the + number of 1.0E-07 seconds (1/10th microseconds!) past WinNT "epoch", + which is "01-Jan-1601 00:00:00 UTC". + + + -PKWARE OpenVMS Extra Field (0x000c): + =================================== + + The following is the layout of PKWARE's OpenVMS attributes + "extra" block. (Last Revision 12/17/91) + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (VMS) 0x000c Short Tag for this "extra" block type + TSize Short Total Data Size for this block + CRC Long 32-bit CRC for remainder of the block + Tag1 Short OpenVMS attribute tag value #1 + Size1 Short Size of attribute #1, in bytes + (var.) Size1 Attribute #1 data + . + . + . + TagN Short OpenVMS attribute tage value #N + SizeN Short Size of attribute #N, in bytes + (var.) SizeN Attribute #N data + + Rules: + + 1. There will be one or more of attributes present, which + will each be preceded by the above TagX & SizeX values. + These values are identical to the ATR$C_XXXX and + ATR$S_XXXX constants which are defined in ATR.H under + OpenVMS C. Neither of these values will ever be zero. + + 2. No word alignment or padding is performed. + + 3. A well-behaved PKZIP/OpenVMS program should never produce + more than one sub-block with the same TagX value. Also, + there will never be more than one "extra" block of type + 0x000c in a particular directory record. + + + -Info-ZIP VMS Extra Field: + ======================== + + The following is the layout of Info-ZIP's VMS attributes extra + block for VAX or Alpha AXP. The local-header and central-header + versions are identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (VMS2) 0x4d49 Short tag for this extra block type ("JM") + TSize Short total data size for this block + ID Long block ID + Flags Short info bytes + BSize Short uncompressed block size + Reserved Long (reserved) + (var.) variable compressed VMS file-attributes block + + The block ID is one of the following unterminated strings: + + "VFAB" struct FAB + "VALL" struct XABALL + "VFHC" struct XABFHC + "VDAT" struct XABDAT + "VRDT" struct XABRDT + "VPRO" struct XABPRO + "VKEY" struct XABKEY + "VMSV" version (e.g., "V6.1"; truncated at hyphen) + "VNAM" reserved + + The lower three bits of Flags indicate the compression method. The + currently defined methods are: + + 0 stored (not compressed) + 1 simple "RLE" + 2 deflated + + The "RLE" method simply replaces zero-valued bytes with zero-valued + bits and non-zero-valued bytes with a "1" bit followed by the byte + value. + + The variable-length compressed data contains only the data corre- + sponding to the indicated structure or string. Typically multiple + VMS2 extra fields are present (each with a unique block type). + + + -Info-ZIP Macintosh Extra Field: + ============================== + + The following is the layout of the (old) Info-ZIP resource-fork extra + block for Macintosh. The local-header and central-header versions + are identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (Mac) 0x07c8 Short tag for this extra block type + TSize Short total data size for this block + "JLEE" beLong extra-field signature + FInfo 16 bytes Macintosh FInfo structure + CrDat beLong HParamBlockRec fileParam.ioFlCrDat + MdDat beLong HParamBlockRec fileParam.ioFlMdDat + Flags beLong info bits + DirID beLong HParamBlockRec fileParam.ioDirID + VolName 28 bytes volume name (optional) + + All fields but the first two are in native Macintosh format + (big-endian Motorola order, not little-endian Intel). The least + significant bit of Flags is 1 if the file is a data fork, 0 other- + wise. In addition, if this extra field is present, the filename + has an extra 'd' or 'r' appended to indicate data fork or resource + fork. The 28-byte VolName field may be omitted. + + + -ZipIt Macintosh Extra Field (long): + ================================== + + The following is the layout of the ZipIt extra block for Macintosh. + The local-header and central-header versions are identical. + (Last Revision 19970130) + + Value Size Description + ----- ---- ----------- + (Mac2) 0x2605 Short tag for this extra block type + TSize Short total data size for this block + "ZPIT" beLong extra-field signature + FnLen Byte length of FileName + FileName variable full Macintosh filename + FileType Byte[4] four-byte Mac file type string + Creator Byte[4] four-byte Mac creator string + + + -ZipIt Macintosh Extra Field (short, for files): + ============================================== + + The following is the layout of a shortened variant of the + ZipIt extra block for Macintosh (without "full name" entry). + This variant is used by ZipIt 1.3.5 and newer for entries of + files (not directories) that do not have a MacBinary encoded + file. The local-header and central-header versions are identical. + (Last Revision 20030602) + + Value Size Description + ----- ---- ----------- + (Mac2b) 0x2705 Short tag for this extra block type + TSize Short total data size for this block (min. 12) + "ZPIT" beLong extra-field signature + FileType Byte[4] four-byte Mac file type string + Creator Byte[4] four-byte Mac creator string + fdFlags beShort attributes from FInfo.frFlags, + may be omitted + 0x0000 beShort reserved, may be omitted + + + -ZipIt Macintosh Extra Field (short, for directories): + ==================================================== + + The following is the layout of a shortened variant of the + ZipIt extra block for Macintosh used only for directory + entries. This variant is used by ZipIt 1.3.5 and newer to + save some optional Mac-specific information about directories. + The local-header and central-header versions are identical. + + Value Size Description + ----- ---- ----------- + (Mac2c) 0x2805 Short tag for this extra block type + TSize Short total data size for this block (12) + "ZPIT" beLong extra-field signature + frFlags beShort attributes from DInfo.frFlags, may + be omitted + View beShort ZipIt view flag, may be omitted + + + The View field specifies ZipIt-internal settings as follows: + + Bits of the Flags: + bit 0 if set, the folder is shown expanded (open) + when the archive contents are viewed in ZipIt. + bits 1-15 reserved, zero; + + + -Info-ZIP Macintosh Extra Field (new): + ==================================== + + The following is the layout of the (new) Info-ZIP extra + block for Macintosh, designed by Dirk Haase. + All values are in little-endian. + (Last Revision 19981005) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (Mac3) 0x334d Short tag for this extra block type ("M3") + TSize Short total data size for this block + BSize Long uncompressed finder attribute data size + Flags Short info bits + fdType Byte[4] Type of the File (4-byte string) + fdCreator Byte[4] Creator of the File (4-byte string) + (CType) Short compression type + (CRC) Long CRC value for uncompressed MacOS data + Attribs variable finder attribute data (see below) + + + Central-header version: + + Value Size Description + ----- ---- ----------- + (Mac3) 0x334d Short tag for this extra block type ("M3") + TSize Short total data size for this block + BSize Long uncompressed finder attribute data size + Flags Short info bits + fdType Byte[4] Type of the File (4-byte string) + fdCreator Byte[4] Creator of the File (4-byte string) + + The third bit of Flags in both headers indicates whether + the LOCAL extra field is uncompressed (and therefore whether CType + and CRC are omitted): + + Bits of the Flags: + bit 0 if set, file is a data fork; otherwise unset + bit 1 if set, filename will be not changed + bit 2 if set, Attribs is uncompressed (no CType, CRC) + bit 3 if set, date and times are in 64 bit + if zero date and times are in 32 bit. + bit 4 if set, timezone offsets fields for the native + Mac times are omitted (UTC support deactivated) + bits 5-15 reserved; + + + Attributes: + + Attribs is a Mac-specific block of data in little-endian format with + the following structure (if compressed, uncompress it first): + + Value Size Description + ----- ---- ----------- + fdFlags Short Finder Flags + fdLocation.v Short Finder Icon Location + fdLocation.h Short Finder Icon Location + fdFldr Short Folder containing file + + FXInfo 16 bytes Macintosh FXInfo structure + FXInfo-Structure: + fdIconID Short + fdUnused[3] Short unused but reserved 6 bytes + fdScript Byte Script flag and number + fdXFlags Byte More flag bits + fdComment Short Comment ID + fdPutAway Long Home Dir ID + + FVersNum Byte file version number + may be not used by MacOS + ACUser Byte directory access rights + + FlCrDat ULong date and time of creation + FlMdDat ULong date and time of last modification + FlBkDat ULong date and time of last backup + These time numbers are original Mac FileTime values (local time!). + Currently, date-time width is 32-bit, but future version may + support be 64-bit times (see flags) + + CrGMTOffs Long(signed!) difference "local Creat. time - UTC" + MdGMTOffs Long(signed!) difference "local Modif. time - UTC" + BkGMTOffs Long(signed!) difference "local Backup time - UTC" + These "local time - UTC" differences (stored in seconds) may be + used to support timestamp adjustment after inter-timezone transfer. + These fields are optional; bit 4 of the flags word controls their + presence. + + Charset Short TextEncodingBase (Charset) + valid for the following two fields + + FullPath variable Path of the current file. + Zero terminated string (C-String) + Currently coded in the native Charset. + + Comment variable Finder Comment of the current file. + Zero terminated string (C-String) + Currently coded in the native Charset. + + + -SmartZIP Macintosh Extra Field: + ==================================== + + The following is the layout of the SmartZIP extra + block for Macintosh, designed by Marco Bambini. + + Local-header version: + + Value Size Description + ----- ---- ----------- + 0x4d63 Short tag for this extra block type ("cM") + TSize Short total data size for this block (64) + "dZip" beLong extra-field signature + fdType Byte[4] Type of the File (4-byte string) + fdCreator Byte[4] Creator of the File (4-byte string) + fdFlags beShort Finder Flags + fdLocation.v beShort Finder Icon Location + fdLocation.h beShort Finder Icon Location + fdFldr beShort Folder containing file + CrDat beLong HParamBlockRec fileParam.ioFlCrDat + MdDat beLong HParamBlockRec fileParam.ioFlMdDat + frScroll.v Byte vertical pos. of folder's scroll bar + fdScript Byte Script flag and number + frScroll.h Byte horizontal pos. of folder's scroll bar + fdXFlags Byte More flag bits + FileName Byte[32] full Macintosh filename (pascal string) + + All fields but the first two are in native Macintosh format + (big-endian Motorola order, not little-endian Intel). + The extra field size is fixed to 64 bytes. + The local-header and central-header versions are identical. + + + -Acorn SparkFS Extra Field: + ========================= + + The following is the layout of David Pilling's SparkFS extra block + for Acorn RISC OS. The local-header and central-header versions are + identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (Acorn) 0x4341 Short tag for this extra block type ("AC") + TSize Short total data size for this block (20) + "ARC0" Long extra-field signature + LoadAddr Long load address or file type + ExecAddr Long exec address + Attr Long file permissions + Zero Long reserved; always zero + + The following bits of Attr are associated with the given file + permissions: + + bit 0 user-writable ('W') + bit 1 user-readable ('R') + bit 2 reserved + bit 3 locked ('L') + bit 4 publicly writable ('w') + bit 5 publicly readable ('r') + bit 6 reserved + bit 7 reserved + + + -VM/CMS Extra Field: + ================== + + The following is the layout of the file-attributes extra block for + VM/CMS. The local-header and central-header versions are + identical. (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (VM/CMS) 0x4704 Short tag for this extra block type + TSize Short total data size for this block + flData variable file attributes data + + flData is an uncompressed fldata_t struct. + + + -MVS Extra Field: + =============== + + The following is the layout of the file-attributes extra block for + MVS. The local-header and central-header versions are identical. + (Last Revision 19960922) + + Value Size Description + ----- ---- ----------- + (MVS) 0x470f Short tag for this extra block type + TSize Short total data size for this block + flData variable file attributes data + + flData is an uncompressed fldata_t struct. + + + -PKWARE Unix Extra Field (0x000d): + ================================ + + The following is the layout of PKWARE's Unix "extra" block. + It was introduced with the release of PKZIP for Unix 2.50. + Note: all fields are stored in Intel low-byte/high-byte order. + (Last Revision 19980901) + + This field has a minimum data size of 12 bytes and is only stored + as local extra field. + + Value Size Description + ----- ---- ----------- + (Unix0) 0x000d Short Tag for this "extra" block type + TSize Short Total Data Size for this block + AcTime Long time of last access (UTC/GMT) + ModTime Long time of last modification (UTC/GMT) + UID Short Unix user ID + GID Short Unix group ID + (var) variable Variable length data field + + The variable length data field will contain file type + specific data. Currently the only values allowed are + the original "linked to" file names for hard or symbolic + links, and the major and minor device node numbers for + character and block device nodes. Since device nodes + cannot be either symbolic or hard links, only one set of + variable length data is stored. Link files will have the + name of the original file stored. This name is NOT NULL + terminated. Its size can be determined by checking TSize - + 12. Device entries will have eight bytes stored as two 4 + byte entries (in little-endian format). The first entry + will be the major device number, and the second the minor + device number. + + [Info-ZIP note: The fixed part of this field has the same layout as + Info-ZIP's abandoned "Unix1 timestamps & owner ID info" extra field; + only the two tag bytes are different.] + + + -PATCH Descriptor Extra Field (0x000f): + ===================================== + + The following is the layout of the Patch Descriptor "extra" + block. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (Patch) 0x000f Short Tag for this "extra" block type + TSize Short Size of the total "extra" block + Version Short Version of the descriptor + Flags Long Actions and reactions (see below) + OldSize Long Size of the file about to be patched + OldCRC Long 32-bit CRC of the file about to be patched + NewSize Long Size of the resulting file + NewCRC Long 32-bit CRC of the resulting file + + + Actions and reactions + + Bits Description + ---- ---------------- + 0 Use for auto detection + 1 Treat as a self-patch + 2-3 RESERVED + 4-5 Action (see below) + 6-7 RESERVED + 8-9 Reaction (see below) to absent file + 10-11 Reaction (see below) to newer file + 12-13 Reaction (see below) to unknown file + 14-15 RESERVED + 16-31 RESERVED + + Actions + + Action Value + ------ ----- + none 0 + add 1 + delete 2 + patch 3 + + Reactions + + Reaction Value + -------- ----- + ask 0 + skip 1 + ignore 2 + fail 3 + + Patch support is provided by PKPatchMaker(tm) technology and is + covered under U.S. Patents and Patents Pending. + + + -PKCS#7 Store for X.509 Certificates (0x0014): + ============================================ + + This field contains information about each of the certificates + files may be signed with. When the Central Directory Encryption + feature is enabled for a ZIP file, this record will appear in + the Archive Extra Data Record, otherwise it will appear in the + first central directory record and will be ignored in any + other record. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (Store) 0x0014 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the store data + SData TSize Data about the store + + SData + Value Size Description + ----- ---- ----------- + Version 2 bytes Version number, 0x0001 for now + StoreD (variable) Actual store data + + The StoreD member is suitable for passing as the pbData + member of a CRYPT_DATA_BLOB to the CertOpenStore() function + in Microsoft's CryptoAPI. The SSize member above will be + cbData + 6, where cbData is the cbData member of the same + CRYPT_DATA_BLOB. The encoding type to pass to + CertOpenStore() should be + PKCS_7_ANS_ENCODING | X509_ASN_ENCODING. + + + -X.509 Certificate ID and Signature for individual file (0x0015): + =============================================================== + + This field contains the information about which certificate in + the PKCS#7 store was used to sign a particular file. It also + contains the signature data. This field can appear multiple + times, but can only appear once per certificate. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (CID) 0x0015 2 bytes Tag for this "extra" block type + CSize 2 bytes Size of Method + Method (variable) + + Method + Value Size Description + ----- ---- ----------- + Version 2 bytes Version number, for now 0x0001 + AlgID 2 bytes Algorithm ID used for signing + IDSize 2 bytes Size of Certificate ID data + CertID (variable) Certificate ID data + SigSize 2 bytes Size of Signature data + Sig (variable) Signature data + + CertID + Value Size Description + ----- ---- ----------- + Size1 4 bytes Size of CertID, should be (IDSize - 4) + Size1 4 bytes A bug in version one causes this value + to appear twice. + IssSize 4 bytes Issuer data size + Issuer (variable) Issuer data + SerSize 4 bytes Serial Number size + Serial (variable) Serial Number data + + The Issuer and IssSize members are suitable for creating a + CRYPT_DATA_BLOB to be the Issuer member of a CERT_INFO + struct. The Serial and SerSize members would be the + SerialNumber member of the same CERT_INFO struct. This + struct would be used to find the certificate in the store + the file was signed with. Those structures are from the MS + CryptoAPI. + + Sig and SigSize are the actual signature data and size + generated by signing the file with the MS CryptoAPI using a + hash created with the given AlgID. + + + -X.509 Certificate ID and Signature for central directory (0x0016): + ================================================================= + + This field contains the information about which certificate in + the PKCS#7 store was used to sign the central directory structure. + When the Central Directory Encryption feature is enabled for a + ZIP file, this record will appear in the Archive Extra Data Record, + otherwise it will appear in the first central directory record, + along with the store. The data structure is the + same as the CID, except that SigSize will be 0, and there + will be no Sig member. + + This field is also kept after the last central directory + record, as the signature data (ID 0x05054b50, it looks like + a central directory record of a different type). This + second copy of the data is the Signature Data member of the + record, and will have a SigSize that is non-zero, and will + have Sig data. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (CDID) 0x0016 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of data that follows + TData TSize Data + + + -Strong Encryption Header (0x0017) (EFS): + =============================== + + Value Size Description + ----- ---- ----------- + 0x0017 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of data that follows + Format 2 bytes Format definition for this record + AlgID 2 bytes Encryption algorithm identifier + Bitlen 2 bytes Bit length of encryption key + Flags 2 bytes Processing flags + CertData TSize-8 Certificate decryption extra field data + (refer to the explanation for CertData + in the section describing the + Certificate Processing Method under + the Strong Encryption Specification) + + + -Record Management Controls (0x0018): + =================================== + + Value Size Description + ----- ---- ----------- +(Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type + CSize 2 bytes Size of total extra block data + Tag1 2 bytes Record control attribute 1 + Size1 2 bytes Size of attribute 1, in bytes + Data1 Size1 Attribute 1 data + . + . + . + TagN 2 bytes Record control attribute N + SizeN 2 bytes Size of attribute N, in bytes + DataN SizeN Attribute N data + + + -PKCS#7 Encryption Recipient Certificate List (0x0019): (EFS) + ===================================================== + + This field contains the information about each of the certificates + that files may be encrypted with. This field should only appear + in the archive extra data record. This field is not required and + serves only to aide archive modifications by preserving public + encryption data. Individual security requirements may dictate + that this data be omitted to deter information exposure. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (CStore) 0x0019 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the store data + TData TSize Data about the store + + TData: + + Value Size Description + ----- ---- ----------- + Version 2 bytes Format version number - must 0x0001 at this time + CStore (var) PKCS#7 data blob + + + -MVS Extra Field (PKWARE, 0x0065): + ================================ + + The following is the layout of the MVS "extra" block. + Note: Some fields are stored in Big Endian format. + All text is in EBCDIC format unless otherwise specified. + + Value Size Description + ----- ---- ----------- + (MVS) 0x0065 2 bytes Tag for this "extra" block type + TSize 2 bytes Size for the following data block + ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or + "T4MV" for TargetFour + (var) TSize-4 Attribute data + + + -OS/400 Extra Field (0x0065): + =========================== + + The following is the layout of the OS/400 "extra" block. + Note: Some fields are stored in Big Endian format. + All text is in EBCDIC format unless otherwise specified. + + Value Size Description + ----- ---- ----------- + (OS400) 0x0065 2 bytes Tag for this "extra" block type + TSize 2 bytes Size for the following data block + ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or + "T4MV" for TargetFour + (var) TSize-4 Attribute data + + + -Extended Timestamp Extra Field: + ============================== + + The following is the layout of the extended-timestamp extra block. + (Last Revision 19970118) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (time) 0x5455 Short tag for this extra block type ("UT") + TSize Short total data size for this block + Flags Byte info bits + (ModTime) Long time of last modification (UTC/GMT) + (AcTime) Long time of last access (UTC/GMT) + (CrTime) Long time of original creation (UTC/GMT) + + Central-header version: + + Value Size Description + ----- ---- ----------- + (time) 0x5455 Short tag for this extra block type ("UT") + TSize Short total data size for this block + Flags Byte info bits (refers to local header!) + (ModTime) Long time of last modification (UTC/GMT) + + The central-header extra field contains the modification time only, + or no timestamp at all. TSize is used to flag its presence or + absence. But note: + + If "Flags" indicates that Modtime is present in the local header + field, it MUST be present in the central header field, too! + This correspondence is required because the modification time + value may be used to support trans-timezone freshening and + updating operations with zip archives. + + The time values are in standard Unix signed-long format, indicating + the number of seconds since 1 January 1970 00:00:00. The times + are relative to Coordinated Universal Time (UTC), also sometimes + referred to as Greenwich Mean Time (GMT). To convert to local time, + the software must know the local timezone offset from UTC/GMT. + + The lower three bits of Flags in both headers indicate which time- + stamps are present in the LOCAL extra field: + + bit 0 if set, modification time is present + bit 1 if set, access time is present + bit 2 if set, creation time is present + bits 3-7 reserved for additional timestamps; not set + + Those times that are present will appear in the order indicated, but + any combination of times may be omitted. (Creation time may be + present without access time, for example.) TSize should equal + (1 + 4*(number of set bits in Flags)), as the block is currently + defined. Other timestamps may be added in the future. + + + -Info-ZIP Unix Extra Field (type 1): + ================================== + + The following is the layout of the old Info-ZIP extra block for + Unix. It has been replaced by the extended-timestamp extra block + (0x5455) and the Unix type 2 extra block (0x7855). + (Last Revision 19970118) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (Unix1) 0x5855 Short tag for this extra block type ("UX") + TSize Short total data size for this block + AcTime Long time of last access (UTC/GMT) + ModTime Long time of last modification (UTC/GMT) + UID Short Unix user ID (optional) + GID Short Unix group ID (optional) + + Central-header version: + + Value Size Description + ----- ---- ----------- + (Unix1) 0x5855 Short tag for this extra block type ("UX") + TSize Short total data size for this block + AcTime Long time of last access (GMT/UTC) + ModTime Long time of last modification (GMT/UTC) + + The file access and modification times are in standard Unix signed- + long format, indicating the number of seconds since 1 January 1970 + 00:00:00. The times are relative to Coordinated Universal Time + (UTC), also sometimes referred to as Greenwich Mean Time (GMT). To + convert to local time, the software must know the local timezone + offset from UTC/GMT. The modification time may be used by non-Unix + systems to support inter-timezone freshening and updating of zip + archives. + + The local-header extra block may optionally contain UID and GID + info for the file. The local-header TSize value is the only + indication of this. Note that Unix UIDs and GIDs are usually + specific to a particular machine, and they generally require root + access to restore. + + This extra field type is obsolete, but it has been in use since + mid-1994. Therefore future archiving software should continue to + support it. Some guidelines: + + An archive member should either contain the old "Unix1" + extra field block or the new extra field types "time" and/or + "Unix2". + + If both the old "Unix1" block type and one or both of the new + block types "time" and "Unix2" are found, the "Unix1" block + should be considered invalid and ignored. + + Unarchiving software should recognize both old and new extra + field block types, but the info from new types overrides the + old "Unix1" field. + + Archiving software should recognize "Unix1" extra fields for + timestamp comparison but never create it for updated, freshened + or new archive members. When copying existing members to a new + archive, any "Unix1" extra field blocks should be converted to + the new "time" and/or "Unix2" types. + + + -Info-ZIP Unix Extra Field (type 2): + ================================== + + The following is the layout of the new Info-ZIP extra block for + Unix. (Last Revision 19960922) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (Unix2) 0x7855 Short tag for this extra block type ("Ux") + TSize Short total data size for this block (4) + UID Short Unix user ID + GID Short Unix group ID + + Central-header version: + + Value Size Description + ----- ---- ----------- + (Unix2) 0x7855 Short tag for this extra block type ("Ux") + TSize Short total data size for this block (0) + + The data size of the central-header version is zero; it is used + solely as a flag that UID/GID info is present in the local-header + extra field. If additional fields are ever added to the local + version, the central version may be extended to indicate this. + + Note that Unix UIDs and GIDs are usually specific to a particular + machine, and they generally require root access to restore. + + + -ASi Unix Extra Field: + ==================== + + The following is the layout of the ASi extra block for Unix. The + local-header and central-header versions are identical. + (Last Revision 19960916) + + Value Size Description + ----- ---- ----------- + (Unix3) 0x756e Short tag for this extra block type ("nu") + TSize Short total data size for this block + CRC Long CRC-32 of the remaining data + Mode Short file permissions + SizDev Long symlink'd size OR major/minor dev num + UID Short user ID + GID Short group ID + (var.) variable symbolic link filename + + Mode is the standard Unix st_mode field from struct stat, containing + user/group/other permissions, setuid/setgid and symlink info, etc. + + If Mode indicates that this file is a symbolic link, SizDev is the + size of the file to which the link points. Otherwise, if the file + is a device, SizDev contains the standard Unix st_rdev field from + struct stat (includes the major and minor numbers of the device). + SizDev is undefined in other cases. + + If Mode indicates that the file is a symbolic link, the final field + will be the name of the file to which the link points. The file- + name length can be inferred from TSize. + + [Note that TSize may incorrectly refer to the data size not counting + the CRC; i.e., it may be four bytes too small.] + + + -BeOS Extra Field: + ================ + + The following is the layout of the file-attributes extra block for + BeOS. (Last Revision 19970531) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (BeOS) 0x6542 Short tag for this extra block type ("Be") + TSize Short total data size for this block + BSize Long uncompressed file attribute data size + Flags Byte info bits + (CType) Short compression type + (CRC) Long CRC value for uncompressed file attribs + Attribs variable file attribute data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (BeOS) 0x6542 Short tag for this extra block type ("Be") + TSize Short total data size for this block (5) + BSize Long size of uncompr. local EF block data + Flags Byte info bits + + The least significant bit of Flags in both headers indicates whether + the LOCAL extra field is uncompressed (and therefore whether CType + and CRC are omitted): + + bit 0 if set, Attribs is uncompressed (no CType, CRC) + bits 1-7 reserved; if set, assume error or unknown data + + Currently the only supported compression types are deflated (type 8) + and stored (type 0); the latter is not used by Info-ZIP's Zip but is + supported by UnZip. + + Attribs is a BeOS-specific block of data in big-endian format with + the following structure (if compressed, uncompress it first): + + Value Size Description + ----- ---- ----------- + Name variable attribute name (null-terminated string) + Type Long attribute type (32-bit unsigned integer) + Size Long Long data size for this sub-block (64 bits) + Data variable attribute data + + The attribute structure is repeated for every attribute. The Data + field may contain anything--text, flags, bitmaps, etc. + + + -AtheOS Extra Field: + ================== + + The following is the layout of the file-attributes extra block for + AtheOS. This field is a very close spin-off from the BeOS e.f. + The only differences are: + - a new extra field signature + - numeric field in the attributes data are stored in little-endian + format ("i386" was initial hardware for AtheOS) + (Last Revision 20040908) + + Local-header version: + + Value Size Description + ----- ---- ----------- + (AtheOS) 0x7441 Short tag for this extra block type ("At") + TSize Short total data size for this block + BSize Long uncompressed file attribute data size + Flags Byte info bits + (CType) Short compression type + (CRC) Long CRC value for uncompressed file attribs + Attribs variable file attribute data + + Central-header version: + + Value Size Description + ----- ---- ----------- + (AtheOS) 0x7441 Short tag for this extra block type ("At") + TSize Short total data size for this block (5) + BSize Long size of uncompr. local EF block data + Flags Byte info bits + + The least significant bit of Flags in both headers indicates whether + the LOCAL extra field is uncompressed (and therefore whether CType + and CRC are omitted): + + bit 0 if set, Attribs is uncompressed (no CType, CRC) + bits 1-7 reserved; if set, assume error or unknown data + + Currently the only supported compression types are deflated (type 8) + and stored (type 0); the latter is not used by Info-ZIP's Zip but is + supported by UnZip. + + Attribs is a AtheOS-specific block of data in little-endian format + with the following structure (if compressed, uncompress it first): + + Value Size Description + ----- ---- ----------- + Name variable attribute name (null-terminated string) + Type Long attribute type (32-bit unsigned integer) + Size Long Long data size for this sub-block (64 bits) + Data variable attribute data + + The attribute structure is repeated for every attribute. The Data + field may contain anything--text, flags, bitmaps, etc. + + + -SMS/QDOS Extra Field: + ==================== + + The following is the layout of the file-attributes extra block for + SMS/QDOS. The local-header and central-header versions are identical. + (Last Revision 19960929) + + Value Size Description + ----- ---- ----------- + (QDOS) 0xfb4a Short tag for this extra block type + TSize Short total data size for this block + LongID Long extra-field signature + (ExtraID) Long additional signature/flag bytes + QDirect 64 bytes qdirect structure + + LongID may be "QZHD" or "QDOS". In the latter case, ExtraID will + be present. Its first three bytes are "02\0"; the last byte is + currently undefined. + + QDirect contains the file's uncompressed directory info (qdirect + struct). Its elements are in native (big-endian) format: + + d_length beLong file length + d_access byte file access type + d_type byte file type + d_datalen beLong data length + d_reserved beLong unused + d_szname beShort size of filename + d_name 36 bytes filename + d_update beLong time of last update + d_refdate beLong file version number + d_backup beLong time of last backup (archive date) + + + -AOS/VS Extra Field: + ================== + + The following is the layout of the extra block for Data General + AOS/VS. The local-header and central-header versions are identical. + (Last Revision 19961125) + + Value Size Description + ----- ---- ----------- + (AOSVS) 0x5356 Short tag for this extra block type ("VS") + TSize Short total data size for this block + "FCI\0" Long extra-field signature + Version Byte version of AOS/VS extra block (10 = 1.0) + Fstat variable fstat packet + AclBuf variable raw ACL data ($MXACL bytes) + + Fstat contains the file's uncompressed fstat packet, which is one of + the following: + + normal fstat packet (P_FSTAT struct) + DIR/CPD fstat packet (P_FSTAT_DIR struct) + unit (device) fstat packet (P_FSTAT_UNIT struct) + IPC file fstat packet (P_FSTAT_IPC struct) + + AclBuf contains the raw ACL data; its length is $MXACL. + + + -Tandem NSK Extra Field: + ====================== + + The following is the layout of the file-attributes extra block for + Tandem NSK. The local-header and central-header versions are + identical. (Last Revision 19981221) + + Value Size Description + ----- ---- ----------- + (TA) 0x4154 Short tag for this extra block type ("TA") + TSize Short total data size for this block (20) + NSKattrs 20 Bytes NSK attributes + + + -THEOS Extra Field: + ================= + + The following is the layout of the file-attributes extra block for + Theos. The local-header and central-header versions are identical. + (Last Revision 19990206) + + Value Size Description + ----- ---- ----------- + (Theos) 0x6854 Short 'Th' signature + size Short size of extra block + flags Byte reserved for future use + filesize Long file size + fileorg Byte type of file (see below) + keylen Short key length for indexed and keyed files, + data segment size for 16 bits programs + reclen Short record length for indexed,keyed and direct, + text segment size for 16 bits programs + filegrow Byte growing factor for indexed,keyed and direct + protect Byte protections (see below) + reserved Short reserved for future use + + File types + ========== + + 0x80 library (keyed access list of files) + 0x40 directory + 0x10 stream file + 0x08 direct file + 0x04 keyed file + 0x02 indexed file + 0x0e reserved + 0x01 16 bits real mode program (obsolete) + 0x21 16 bits protected mode program + 0x41 32 bits protected mode program + + Protection codes + ================ + + User protection + --------------- + 0x01 non readable + 0x02 non writable + 0x04 non executable + 0x08 non erasable + + Other protection + ---------------- + 0x10 non readable + 0x20 non writable + 0x40 non executable Theos before 4.0 + 0x40 modified Theos 4.x + 0x80 not hidden + + + -THEOS old inofficial Extra Field: + ================================ + + The following is the layout of an inoffical former version of a + Theos file-attributes extra blocks. This layout was never published + and is no longer created. However, UnZip can optionally support it + when compiling with the option flag OLD_THEOS_EXTRA defined. + Both the local-header and central-header versions are identical. + (Last Revision 19990206) + + Value Size Description + ----- ---- ----------- + (THS0) 0x4854 Short 'TH' signature + size Short size of extra block + flags Short reserved for future use + filesize Long file size + reclen Short record length for indexed,keyed and direct, + text segment size for 16 bits programs + keylen Short key length for indexed and keyed files, + data segment size for 16 bits programs + filegrow Byte growing factor for indexed,keyed and direct + reserved 3 Bytes reserved for future use + + + -FWKCS MD5 Extra Field (0x4b46): + ============================== + + The FWKCS Contents_Signature System, used in automatically + identifying files independent of filename, optionally adds + and uses an extra field to support the rapid creation of + an enhanced contents_signature. + There is no local-header version; the following applies + only to the central header. (Last Revision 19961207) + + Central-header version: + + Value Size Description + ----- ---- ----------- + (MD5) 0x4b46 Short tag for this extra block type ("FK") + TSize Short total data size for this block (19) + "MD5" 3 bytes extra-field signature + MD5hash 16 bytes 128-bit MD5 hash of uncompressed data + (low byte first) + + When FWKCS revises a .ZIP file central directory to add + this extra field for a file, it also replaces the + central directory entry for that file's uncompressed + file length with a measured value. + + FWKCS provides an option to strip this extra field, if + present, from a .ZIP file central directory. In adding + this extra field, FWKCS preserves .ZIP file Authenticity + Verification; if stripping this extra field, FWKCS + preserves all versions of AV through PKZIP version 2.04g. + + FWKCS, and FWKCS Contents_Signature System, are + trademarks of Frederick W. Kantor. + + (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer + Science and RSA Data Security, Inc., April 1992. + ll.76-77: "The MD5 algorithm is being placed in the + public domain for review and possible adoption as a + standard." diff --git a/proginfo/fileinfo.cms b/proginfo/fileinfo.cms new file mode 100644 index 0000000..9d21935 --- /dev/null +++ b/proginfo/fileinfo.cms @@ -0,0 +1,231 @@ +[Quoting from a C/370 manual, courtesy of Carl Forde.] + + C/370 supports three types of input and output: text streams, binary + streams, and record I/O. Text and binary streams are both ANSI + standards; record I/O is a C/370 extension. + +[...] + + Record I/O is a C/370 extension to the ANSI standard. For files + opened in record format, C/370 reads and writes one record at a + time. If you try to write more data to a record than the record + can hold, the data is truncated. For record I/O, C/370 only allows + the use of fread() and fwrite() to read and write to the files. Any + other functions (such as fprintf(), fscanf(), getc(), and putc()) + fail. For record-orientated files, records do not change size when + you update them. If the new data has fewer characters than the + original record, the new data fills the first n characters, where + n is the number of characters of the new data. The record will + remain the same size, and the old characters (those after) n are + left unchanged. A subsequent update begins at the next boundary. + For example, if you have the string "abcdefgh": + + abcdefgh + + and you overwrite it with the string "1234", the record will look + like this: + + 1234efgh + + C/370 record I/O is binary. That is, it does not interpret any of + the data in a record file and therefore does not recognize control + characters. + + + The record model consists of: + + * A record, which is the unit of data transmitted to and from a + program + * A block, which is the unit of data transmitted to and from a + device. Each block may contain one or more records. + + In the record model of I/O, records and blocks have the following + attributes: + + RECFM Specifies the format of the data or how the data is organized + on the physical device. + LRECL Specifies the length of logical records (as opposed to + physical ones). + + BLKSIZE Specifies the length of physical records (blocks on the + physical device). + + + Opening a File by Filename + + The filename that you specify on the call to fopen() or freopen() + must be in the following format: + + >> ----filename---- ----filetype-------------------- + | | | | + --.-- -- --filemode-- + | | + --.-- + where + + filename is a 1- to 8-character string of any of the characters, + A-Z, a-z, 0-9, and +, -, $, #, @, :, and _. You can separate it + from the filetype with one or more spaces, or with a period. + [Further note: filenames are fully case-sensitive, as in Unix.] + + filetype is a 1- to 8-character string of any of the characters, + A-Z, a-z, 0-9, and +, -, $, #, @, :, and _. You can separate it + from the filemode with one or more spaces, or with a period. The + separator between filetype and filemode must be the same as the + one between filename and filetype. + + filemode is a 1- to 2-character string. The first must be any of + the characters A-Z, a-z, or *. If you use the asis parameter on + the fopen() or freopen() call, the first character of the filemode + must be a capital letter or an asterisk. Otherwise, the function + call fails. The second character of filemode is optional; if you + specify it, it must be any of the digits 0-6. You cannot specify + the second character if you have specified * for the first one. + + If you do not use periods as separators, there is no limit to how + much whitespace you can have before and after the filename, the + filetype, and filemode. + + + Opening a File without a File Mode Specified + + If you omit the file mode or specify * for it, C/370 does one + of the following when you call fopen() or freopen(): + + * If you have specified a read mode, C/370 looks for the named file + on all the accessed readable disks, in order. If it does not find + the file, the fopen() or freopen() call fails. + * If you have specified any of the write modes, C/370 writes the file + on the first writable disk you have accessed. Specifying a write + mode on an fopen() or freopen() call that contains the filename of + an existing file destroys that file. If you do not have any + writable disks accessed, the call fails. + + + fopen() and freopen() parameters + + recfm + CMS supports only two RECFMs, V and F. [note that MVS supports + 27(!) different RECFMs.] If you do not specify the RECFM for a + file, C/370 determines whether is is in fixed or variable format. + + lrecl and blksize + For files in fixed format, CMS allows records to be read and + written in blocks. To have a fixed format CMS file treated as a + fixed blocked CMS file, you can open the file with recfm=fb and + specify the lrecl and blksize. If you do not specify a recfm on + the open, the blksize can be a multiple of the lrecl, and the + file is treated as if it were blocked. + + For files in variable format, the CMS LRECL is different from the + LRECL for the record model. In the record model, the LRECL is + equal to the data length plus 4 bytes (for the record descriptor + word), and the BLKSIZE is equal to the LRECL plus 4 bytes (for + the block descriptor word). In CMS, BDWs and RDWs do not exist, + but because CMS follows the record model, you must still account + for them. When you specify V, you must still allocate the record + descriptor word and block descriptor word. That is, if you want + a maximum of n bytes per record, you must specify a minimum LRECL + of n+4 and a minimum BLKSIZE of n+8. + + When you are appending to V files, you can enlarge the record size + dynamically, but only if you have not specified LRECL or BLKSIZE + on the fopen() or freopen() command that opened the file. + + type + If you specify this parameter, the only valid value for CMS disk + files is type =record. This opens a file for record I/O. + + asis + If you use this parameter, you can open files with mixed-case + filenames such as JaMeS dAtA or pErCy.FILE. If you specify this + parameter, the file mode that you specify must be a capital letter + (if it is not an asterisk); otherwise; the function call fails and + the value returned is NULL. + + + Reading from Record I/O Files + fread() is the only interface allowed for reading record I/O files. + Each time you call fread() for a record I/O file, fread() reads + one record from the system. If you call fread() with a request for + less than a complete record, the requested bytes are copied to your + buffer, and the file position is set to the start fo the next + record. If the request is for more bytes that are in the record, + one record is read and the position is set to the start of the next + record. C/370 does not strip any blank characters or interpret any + data. + + fread() returns the number of items read successfully, so if you + pass a size argument equal to 1 and a count argument equal to the + maximum expected length of the record, fread() returns the length, + in bytes, of the record read. If you pass a size argument equal + to the maximum expected length of the record, and a count argument + equal to 1, fread() returns either 0 or 1, indicating whether a + record of length size read. If a record is read successfully but + is less than size bytes long, fread() returns 0. + + + Writing to Record I/O Files + fwrite() is the only interface allowed for writing to a file + opened for record I/O. Only one record is written at a time. If + you attempt to write more new data than a full record can hold or + try to update a record with more data than it currently has, C/370 + truncates your output at the record boundary. When C/370 performs + a truncation, it sets errno and raises SIGIOERR, if SIGIOERR is not + set to SIG_IGN. + + When you are writing new records to a fixed-record I/O file, if you + try to write a short record, C/370 pads the record with nulls out + to LRECL. + + At the completion of an fwrite(), the file position is at the start + of the next record. For new data, the block is flushed out to the + system as soon as it is full. + + + fldata() Behavior + When you call the fldata() function for an open CMS minidisk file, + it returns a data structure that looks like this: + + struct __filedata { + unsigned int __recfmF : 1, /* fixed length records */ + __recfmV : 1, /* variable length records */ + __recfmU : 1, /* n/a */ + __recfmS : 1, /* n/a */ + __recfmBlk : 1, /* n/a */ + __recfmASA : 1, /* text mode and ASA */ + __recfmM : 1, /* n/a */ + __dsorgPO : 1, /* n/a */ + __dsorgPDSmem : 1, /* n/a */ + __dsorgPDSdir : 1, /* n/a */ + __dsorgPS : 1, /* sequential data set */ + __dsorgConcat : 1, /* n/a */ + __dsorgMem : 1, /* n/a */ + __dsorgHiper : 1, /* n/a */ + __dsorgTemp : 1, /* created with tmpfile() */ + __dsorgVSAM : 1, /* n/a */ + __reserve1 : 1, /* n/a */ + __openmode : 2, /* see below 1 */ + __modeflag : 4, /* see below 2 */ + __reserve2 : 9, /* n/a */ + + char __device; __DISK + unsigned long __blksize, /* see below 3 */ + __maxreclen; /* see below 4 */ + unsigned short __vsamtype; /* n/a */ + unsigned long __vsamkeylen; /* n/a */ + unsigned long __vsamRKP; /* n/a */ + char * __dsname; /* fname ftype fmode */ + unsigned int __reserve4; /* n/a */ + + /* note 1: values are: __TEXT, __BINARY, __RECORD + note 2: values are: __READ, __WRITE, __APPEND, __UPDATE + these values can be added together to determine + the return value; for example, a file opened with + a+ will have the value __READ + __APPEND. + note 3: total block size of the file, including ASA + characters as well as RDW information + note 4: maximum record length of the data only (includes + ASA characters but excludes RDW information). + */ + }; diff --git a/proginfo/infozip.who b/proginfo/infozip.who new file mode 100644 index 0000000..242cd95 --- /dev/null +++ b/proginfo/infozip.who @@ -0,0 +1,232 @@ +These members of the Info-ZIP group contributed to the development and +testing of portable Zip. They are responsible for whatever works in +Zip. Whatever doesn't work is solely the fault of the authors of Zip +(Mark Adler, Rich Wales, Jean-loup Gailly, Kai Uwe Rommel, Igor Mandrichenko, +Onno van der Linden, Christian Spieler, John Bush, Paul Kienitz, Sergio Monesi +and Karl Davis). If you have contributed and your name +has been forgotten, please send a reminder to the zip-bugs address given +in the Readme file. The names are given here in alphabetical order, +because it's impossible to classify them by importance of the +contribution. Some have made a complete port to a new target, some +have provided a one line fix. All are to be thanked. + +Mark Adler madler@tybalt.caltech.edu NeXT 2.x + alan@spri.levels.unisa.edu.au Linux +Jeffrey Altman jaltman@watsun.cc.columbia.edu fseek bug on NT +Glenn J. Andrews oper1%drcv06.decnet@drcvax.af.mil VAX VMS +James Van Artsdalen james@raid.dell.com bug report +Eric Backus ericb@lsid.hp.com bug report +Quentin Barnes qbarnes@urbana.css.mot.com unix/Makefile mode of + installed files +Elmar Bartel bartel@informatik.tu-muenchen.de +Mark E. Becker mbecker@cs.uml.edu bug report +Paul von Behren Paul_von_Behren@stortek.com OS/390 port +Jon Bell swy@wsdot.wa.gov Intergraph/CLIX +Michael Bernardi mike@childsoc.demon.co.uk RS6000 +Tom Betz marob!upaya!tbetz@phri.nyu.edu SCO Xenix 2.3.1 +James Birdsall jwbirdsa@picarefy.com AT&T 3B1 +George boer@fwi.uva.nl OS/2 +Michael Bolton bolton@vaxc.erim.org VAX/VMS +Wim Bonner 27313853@WSUVM1.CSC.WSU.EDU HP 9000/840a HPUX +Paul Borman prb@cray.com Cray-X/YMP,2 UNICOS 6-8 +Kurt Van den Branden kvd2@bipsy.se.bel.alcatel.be VAX VMS +Scott Briggs briggs@nashua.progress.com Windows NT +Leslie C. Brown lbrown@BRL.MIL Pyramid MIS-4 +Ralf Brown ralf@b.gp.cs.cmu.edu Pyramid MIS-4 +Rodney Brown rdb@cmutual.com.au SunOS 4.1.3 DGUX OSF/1 + HP-UX CRC optimization +Jeremy Daniel Buhler jbuhler@owlnet.rice.edu BC++ +John Bush john.bush@east.sun.com Amiga (SAS/C) +Pietro Caselli zaphod@petruz.sublink.org Minix 1.5.10 +Andrew A. Chernov ache@astral.msk.su FreeBSD +Jeff Coffler jeffcof@microsoft.com Windows NT +David Dachtera David.Dachtera@advocatehealth.com VMS + link_zip.com bug +Bill Davidsen davidsen@crdos1.crd.ge.com Xenix (on what?) +Karl Davis riscman@geko.com.au Acorn +Daniel Deimert daniel@pkmab.se zeus3.21 Zilog S8000 +David Denholm denholm@sotona.physics.southampton.ac.uk VMS +Harald Denker harry@hal.westfalen.de ATARI +Matthew J. D'Errico doc@magna.com Bull +L. Peter Deutsch ghost@aladdin.com Linux +Uwe Doering gemini@geminix.in-berlin.de 386 Unix +Jean-Michel Dubois jmdubois@ibcfrance.fr Theos support +James P. Dugal jpd@usl.edu Pyramid 90X OSx4.1 +"Evil Ed" esaffle@gmuvax2.gmu.edu Ultrix-32 V3.1 (Rev. 9) +Patrick Ellis pellis@aic.mdc.com VMS zip -h appearance +Thomas Esken esken@uni-muenster.de Acorn fix +Dwight Estep estep@dlo10.enet.dec.com MSDOS +David A. Feinleib t-davefe@microsoft.com Windows NT +Joshua Felsteiner joshua@phys1.technion.ac.il Linux +Greg Flint afc@klaatu.cc.purdue.edu ETA-10P* hybrid Sys V +Carl Forde cforde@bcsc02.gov.bc.ca VM/CMS +Jeff Foy jfoy@glia.biostr.washington.edu IRIX Sys V Rel 3.3.1 +Mike Freeman mikef@pacifier.com Vax VMS +Kevin M. Fritz kmfritz@apgea.army.mil Turbo C++ 1.0 + Pyramid +Jean-loup Gailly jloup@chorus.fr MS-DOS Microsoft C 5.1 +Scott D. Galloway sgallowa@letterkenn-emh1.army.mil Sperry 5000 SysV.3 +Rainer Gerling gerling@faupt101.physik.uni-erlangen.de HPUX, MSDOS +Henry Gessau henryg@kullmar.kullmar.se Windows NT +Ian E. Gorman ian@iosphere.net ported zip 2.2 to VM/CMS +Wayne R. Graves graves@csa2.lbl.gov Vax VMS +George Grimes grimes@netcom.com Apollo Domain SR10.4 +Hunter Goatley goathunter@MadGoat.com VMS (VAX & Alpha) +Arnt Gulbrandsen agulbra@pvv.unit.no Linux +David Gundlach david@rolf.stat.uga.edu Sun SS1+ SunOS 4.1 +Peter Gutmann pgut1@cs.aukuni.ac.nz bug report +Dirk Haase d_haase@sitec.de MacOS port +Mark Hanning-Lee markhl@iris-355.jpl.nasa.gov SGI +Walter Haidinger e9225662@student.tuwien.ac.at Amiga and general fixes +Charles Hannum mycroft@ai.mit.edu bug report +Greg Hartwig ghartwig@ix.netcom.com VM/CMS cleanup +Tanvir Hassan tanvir.hassan@autodesk.com NT +Bob Hardy hardy@lucid.com Power C on MSDOS +Zachary Heilig heilig@plains.nodak.edu Turbo C++ 3.0 +Chris Herborth chrish@pobox.com BeOS port +Jonathan Hudson jrhudson@bigfoot.com QDOS port +Mark William Jacobs mark@mensch.stanford.edu MSDOS +Aubrey Jaffer jaffer@martigny.ai.mit.edu Pixel +Peter Jones jones.peter@uqam.ca MIPS UMIPS 4.0 + +Onolimit fix for HP-UX +Kjetil W. J{\o}rgensen jorgens@lise.unit.no OSF/1, DJGPP v2 +Bruce Kahn bkahn@archive.webo.dg.com MS-DOS Microsoft C 5.1 +Jonathan I. Kamens jik@pit-manager.mit.edu ultrix on DECstation +Dave Kapalko d.kapalko@att.com bug report +Bob Kemp Robert.V.Kemp@att.com AT&T 3B2 SysV 3.2v2 +Vivek Khera khera@cs.duke.edu SunOS +Earl Kiech KIECH@utkvx.utk.edu VAX VMS V5.4-1A +Paul Kienitz Paul.Kienitz@shelter.sf.ca.us Amiga, Watcom C +David Kirschbaum kirsch@usasoc.soc.mil He got us all in this + mess in the first place +Thomas Klausner wiz@danbala.tuwien.ac.at cygwin32 and -k fix +D. Krumbholz krumbh00@marvin.informatik.uni-dortmund.de + Acorn filetype and + timestamp bug report + +Bo Kullmar bk@kullmar.se DNIX 5.3, SunOS 4.1 +Baden Kudrenecky baden@unixg.ubc.ca OS/2 +Giuseppe La Sala lasala@mail.esa.esrin.it VMS +Jean-Marc Lasgouttes jean-marc.lasgouttes@inria.fr Bug report +Harry Langenbacher harry@neuron6.Jpl.Nasa.Gov Sun SS1+ SunOS 4.1 +Michael D. Lawler mdlawler@gwmicro.com Mt.Xinu BSD 4.3 on VAX + Borland C++ 4.51 +Johnny Lee johnnyl@microsoft.com Microsoft C 7.0 +Michael Lemke michael@io.as.utexas.edu VMS +David Lemson lemson@ux1.cso.uiuc.edu Sequent Dynix 3.0.17 +Tai-Shan Lin tlin@snakeyes.eecs.wsu.edu OS/2 +Onno van der Linden onno@simplex.nl NetBSD, Borland C++, + MSC 7.0, DJGPP 2 + +Michel loehden%mv13.decnet@vax.hrz.uni-marburg.de VMS +Warner Losh imp@Solbourne.COM packing algorithm help +Dave Lovelace davel@grex.cyberspace.org DG AOS/VS +Erik Luijten erik@tntnhb3.tn.tudelft.nl problem report +John Lundin lundin@urvax.urich.edu VAX VMS +Igor Mandrichenko mandrichenko@m10.ihep.su VAX VMS +Cliff Manis root@csoftec.csf.com SCO 2.3.1 (386) +Fulvio Marino fulvio@iconet.ico.olivetti.it X/OS 2.3 & 2.4 +Bill Marsh bmarsh@cod.nosc.mil SGI Iris 4D35 +Michael Mauch mauch@gmx.de djgpp LFN attribute fix +Peter Mauzey ptm@mtdcr.mt.lucent.com AT&T 6300, 7300 +Rafal Z. Maszkowski rzm@mat.torun.edu.pl Convex +Robert McBroom (?) rm3@ornl.gov DECsystem 5810 +Tom McConnell tmcconne@sedona.intel.com NCR SVR4 +Frank P. McIngvale frankm@eng.auburn.edu Bug report +Conor McMenamin C.S.McMenamin@sussex.ac.uk MSDOS +John Messenger jlm@proteon.com Bug report +Michael kuch@mailserv.zdv.uni-tuebingen.de SGI +Dan Mick dmick@pongo.west.sun.com Solaris +Alan Modra alan@spri.levels.unisa.edu.au Linux +Laszlo Molnar lmolnar@goliat.eik.bme.hu DJGPP v2 +Jim Mollmann jmq@nccibm1.bitnet OS/2 & MVS +Sergio Monesi pel0015@cdc8g5.cdc.polimi.it Acorn +J. Mukherjee jmukherj@ringer.cs.utsa.edu OS/2 +Anthony Naggs amn@ubik.demon.co.uk bug report +Matti Narkia matti.narkia@ntc.nokia.com VAX VMS +Robert E. Newman Jr. newmanr@ssl.msfc.nasa.gov bug report +Robert Nielsen NielsenRJ@ems.com 2.2 -V VMS bug report +Christian Michel cmichel@de.ibm.com 2.2 check_dup OS/2 bug + report +Thomas S. Opheys opheys@kirk.fmi.uni-passau.de OS/2 +Humberto Ortiz-Zuazaga zuazaga@ucunix.san.uc.edu Linux +James E. O'Dell jim@fpr.com MacOS +William O'Shaughnessy williamo@hpcupt1.cup.hp.com HPUX +Neil Parks neil.parks@pcohio.com MSDOS +Enrico Renato Palmerini palmer@vxscaq.cineca.it UNISYS 7000 Sys 5 r2.3 +Geoff Pennington Geoff.Pennington@sgcs.co.uk -q output bug +Keith Petersen w8sdz@simtel20.army.mil Pyramid UCB OSx4.4c +George Petrov VM/CMS, MVS +Alan Phillips postmaster@lancaster.ac.uk Dynix/ptx 1.3 +Bruno Pillard bp@chorus.fr SunOS 4.1 +Piet W. Plomp piet@icce.rug.nl MSC 7.0, SCO 3.2v4.0 +John Poltorak j.poltorak@bradford.ac.uk problem report +Kenneth Porter 72420.2436@compuserve.com OS/2 +Norbert Pueschel pueschel@imsdd.meb.uni-bonn.de Amiga time.lib +Yuval Rakavy yuval@cs.huji.ac.il MSDOS +David A Rasmussen dave@convex.csd.uwm.edu Convex C220 with 9.0 OS +Eric Raymond esr@snark.thyrsus.com Unix +Jim Read 74312.3103@compuserve.com OS/2 +Michael Regoli mr@cica.indiana.edu Ultrix 3.1 VAX 8650 + BSD 4.3 IBM RT/125 + BSD 4.3 MicroVAX 3500 + SunOS 4.0.3 Sun 4/330 +Jochen Roderburg roderburg@rrz.uni-koeln.de Digital Unix with + AFS/NFS converter +Rick Rodgers rodgers@maxwell.mmwb.ucsf.EDU Unix man page +Greg Roelofs roe2@midway.uchicago.edu SunOS 4.1.1,4.1.2 Sun 4 + Unicos 5.1--6.1.5 Cray + OS/2 1.3 MS C 6.0 + Ultrix 4.1,4.2 DEC 5810 + VMS 5.2, 5.4 VAX 8600 + Irix 3.3.2, SGI Iris 4D + UTS 1.2.4 Amdahl 5880 +Phil Ritzenthaler phil@cgrg.ohio-state.edu SYSV +Kai Uwe Rommel rommel@ars.de or rommel@leo.org OS/2 +Markus Ruppel m.ruppel@imperial.ac.uk OS/2 +Shimazaki Ryo eririn@ma.mailbank.ne.jp human68k +Jon Saxton jrs@panix.com Microsoft C 6.0 +Steve Salisbury stevesa@msn.com Microsoft C 8.0 +Timo Salmi ts@uwasa.fi bug report +Darren Salt ds@youmustbejoking.demon.co.uk RISC OS +NIIMI Satoshi a01309@cfi.waseda.ac.jp Human68K +Tom Schmidt tschmidt@micron.com SCO 286 +Martin Schulz martin.schulz@isltd.insignia.com Windows NT, Atari +Dan Seyb dseyb@halnet.com AIX +Mark Shadley shadcat@catcher.com unix fixes +Timur Shaporev tim@rd.relcom.msk.su MSDOS +W. T. Sidney sidney@picard.med.ge.com bug report +Dave Sisson daves@vtcosy.cns.vt.edu AIX 1.1.1 PS/2 & 3090 +Dave Smith smithdt@bp.com Tandem port +Fred Smith fredex@fcshome.stoneham.ma.us Coherent +Christian Spieler spieler@ikp.tu-darmstadt.de VMS, MSDOS, emx, djgpp, + WIN32, Linux +Ron Srodawa srodawa@vela.acs.oakland.edu SCO Xenix/386 2.3.3 +Adam Stanley astanley@winternet.com MSDOS +Bertil Stenstr|m stenis@heron.dafa.se HP-UX 7.0 HP9000/835 +Carl Streeter streeter@oshkoshw.bitnet OS/2 +Reuben Sumner rasumner@undergrad.math.uwaterloo.ca Suggestions +E-Yen Tan e-yen.tan@brasenose.oxford.ac.uk Borland C++ win32 +Yoshioka Tsuneo tsuneo-y@is.aist-nara.ac.jp Multibyte charset + support +Paul Telles paul@pubnet.com SCO Xenix +Julian Thompson jrt@oasis.icl.co.uk bug report +Christopher C. Tjon tjon@plains.nodak.edu bug report +Robert F Tobler rft@cs.stanford.edu bug report +Eric Tomio tomio@acri.fr bug report +Cosmin Truta cosmint@cs.ubbcluj.ro win32 gcc based + asm +Anthony R. Venson cevens@unix1.sncc.lsu.edu MSDOS/emx +Antoine Verheijen antoine@sysmail.ucs.ualberta.ca envargs fix +Arjan de Vet devet@info.win.tue.nl SunOS 4.1, MSC 5.1 +Santiago Vila Doncel sanvila@ba.unex.es MSDOS +Johan Vromans jv@mh.nl bug report +Rich Wales wales@cs.ucla.edu SunOS 4.0.3 Sun-3/50 +Scott Walton scottw@io.com BSD/386 +Frank J. Wancho wancho@wsmr-simtel20.army.mil TOPS-20 + oyvind@stavanger.sgp.slb.com Bug report. +Takahiro Watanabe wata@first.tsukuba.ac.jp fixes for INSTALL +Mike White mwhite@pumatech.com wizzip DLL +Ray Wickert wickert@dc-srv.pa-x.dec.com MSDOS/DJGPP +Winfried Winkler willi@wap0109.chem.tu-berlin.de AIX +Norman J. Wong as219@freenet.carleton.ca MSDOS +Martin Zinser m.zinser@gsi.de VMS 7.x diff --git a/proginfo/nt.sd b/proginfo/nt.sd new file mode 100644 index 0000000..8ac31ba --- /dev/null +++ b/proginfo/nt.sd @@ -0,0 +1,111 @@ +Info-ZIP portable Zip/UnZip Windows NT security descriptor support +================================================================== +Scott Field (sfield@microsoft.com), 8 October 1996 + + +This version of Info-ZIP's Win32 code allows for processing of Windows +NT security descriptors if they were saved in the .zip file using the +appropriate Win32 Zip running under Windows NT. This also requires +that the file system that Zip/UnZip operates on supports persistent +Acl storage. When the operating system is not Windows NT and the +target file system does not support persistent Acl storage, no security +descriptor processing takes place. + +A Windows NT security descriptor consists of any combination of the +following components: + + an owner (Sid) + a primary group (Sid) + a discretionary ACL (Dacl) + a system ACL (Sacl) + qualifiers for the preceding items + +By default, Zip will save all aspects of the security descriptor except +for the Sacl. The Sacl contains information pertaining to auditing of +the file, and requires a security privilege be granted to the calling +user in addition to being enabled by the calling application. In order +to save the Sacl during Zip, the user must specify the -! switch on the +Zip commandline. The user must also be granted either the SeBackupPrivilege +"Backup files and directories" or the SeSystemSecurityPrivilege "Manage +auditing and security log". + +By default, UnZip will not restore any aspects of the security descriptor. +If the -X option is specified to UnZip, the Dacl is restored to the file. +The other items in the security descriptor on the new file will receive +default values. If the -XX option is specified to UnZip, as many aspects +of the security descriptor as possible will be restored. If the calling +user is granted the SeRestorePrivilege "Restore files and directories", +all aspects of the security descriptor will be restored. If the calling +user is only granted the SeSystemSecurityPrivilege "Manage auditing and +security log", only the Dacl and Sacl will be restored to the new file. + +Note that when operating on files that reside on remote volumes, the +privileges specified above must be granted to the calling user on that +remote machine. Currently, there is no way to directly test what privileges +are present on a remote machine, so Zip and UnZip make a remote privilege +determination based on an indirect method. + +UnZip considerations +-------------------- + +In order for file security to be processed correctly, any directory entries +that have a security descriptor will be processed at the end of the unzip +cycle. This allows for unzip to process files within the newly created +directory regardless of the security descriptor associated with the directory +entry. This also prevents security inheritance problems that can occur as +a result of creating a new directory and then creating files in that directory +that will inherit parent directory permissions; such inherited permissions may +prevent the security descriptor taken from the zip file from being applied +to the new file. + +If directories exist which match directory/extract paths in the .zip file, +file security is not updated on the target directory. It is assumed that if +the target directory already exists, then appropriate security has already +been applied to that directory. + +"unzip -t" will test the integrity of stored security descriptors when +present and the operating system is Windows NT. + +ZipInfo (unzip -Z) will display information on stored security descriptor +when "unzip -Zv" is specifed. + + +Potential uses +============== + +The obvious use for this new support is to better support backup and restore +operations in a Windows NT environment where NTFS file security is utilized. +This allows individuals and organizations to archive files in a portable +fashion and transport these files across the organization. + +Another potential use of this support is setup and installation. This +allows for distribution of Windows NT based applications that have preset +security on files and directories. For example, prior to creation of the +.zip file, the user can set file security via File Manager or Explorer on +the files to be contained in the .zip file. In many cases, it is appropriate +to only grant Everyone Read access to .exe and .dll files, while granting +Administrators Full control. Using this support in conjunction with the +unzipsfx.exe self-extractor stub can yield a useful and powerful way to +install software with preset security (note that -X or -XX should be +specified on the self-extractor commandline). + +When creating .zip files with security which are intended for transport +across systems, it is important to take into account the relevance of +access control entries and the associated Sid of each entry. For example, +if a .zip file is created on a Windows NT workstation, and file security +references local workstation user accounts (like an account named Fred), +this access entry will not be relevant if the .zip file is transported to +another machine. Where possible, take advantage of the built-in well-known +groups, like Administrators, Everyone, Network, Guests, etc. These groups +have the same meaning on any Windows NT machine. Note that the names of +these groups may differ depending on the language of the installed Windows +NT, but this isn't a problem since each name has well-known ID that, upon +restore, translates to the correct group name regardless of locale. + +When access control entries contain Sid entries that reference Domain +accounts, these entries will only be relevant on systems that recognize +the referenced domain. Generally speaking, the only side effects of +irrelevant access control entries is wasted space in the stored security +descriptor and loss of complete intended access control. Such irrelevant +access control entries will show up as "Account Unknown" when viewing file +security with File Manager or Explorer. diff --git a/proginfo/perform.dos b/proginfo/perform.dos new file mode 100644 index 0000000..98744ee --- /dev/null +++ b/proginfo/perform.dos @@ -0,0 +1,183 @@ +Date: Wed, 27 Mar 1996 01:31:50 CET +0100 +From: Christian Spieler (IKDA, THD, D-64289 Darmstadt) +Subject: More detailed comparison of MSDOS Info-ZIP programs' performance + +Hello all, + +In response to some additional questions and requests concerning +my previous message about DOS performance of 16/32-bit Info-ZIP programs, +I have produced a more detailed comparison: + +System: +Cx486DX-40, VL-bus, 8MB; IDE hard disk; +DOS 6.2, HIMEM, EMM386 NOEMS NOVCPI, SMARTDRV 3MB, write back. + +I have used the main directory of UnZip 5.20p as source, including the +objects and executable of an EMX compile for unzip.exe (to supply some +binary test files). + +Tested programs were (my current updated sources!) Zip 2.0w and UnZip 5.20p +- 16-bit MSC 5.1, compressed with LZEXE 0.91e +- 32-bit Watcom C 10.5, as supplied by Kai Uwe Rommel (PMODE 1.22) +- 32-bit EMX 0.9b +- 32-bit DJGPP v2 +- 32-bit DJGPP v1.12m4 + +The EMX and DJ1 (GO32) executables were bound with the full extender, to +create standalone executables. + +A) Tests of Zip + Command : "<system>\zip.exe -q<#> tes.zip unz/*" (unz/*.* for Watcom!!) + where <#> was: 0, 1, 6, 9. + The test archive "tes.zip" was never deleted, this test + measured "time to update archive". + + The following table contains average execution seconds (averaged over + at least 3 runs, with the first run discarted to fill disk cache); + numbers in parenteses specify the standard deviation of the last + digits. + + cmpr level| 0 | 1 | 6 | 9 + =============================================================== + EMX win95 | 7.77 | 7.97 | 12.82 | 22.31 + --------------------------------------------------------------- + EMX | 7.15(40) | 8.00(6) | 12.52(25) | 20.93 + DJ2 | 13.50(32) | 14.20(7) | 19.05 | 28.48(9) + DJ1 | 13.56(30) | 14.48(3) | 18.70 | 27.43(13) + WAT | 6.94(22) | 8.93 | 15.73(34) | 30.25(6) + MSC | 5.99(82) | 9.40(4) | 13.59(9) | 20.77(4) + =============================================================== + + The "EMX win95" line was created for comparison, to check the performance + of emx 0.9 with the RSX extender in a DPMI environment. (This line was + produced by applying the "stubbed" EMX executable in a full screen DOS box.) + + +B) Tests of UnZip + Commands : <system>\unzip.exe -qt tes.zip (testing performance) + <system>\unzip.exe -qo tes.zip -dtm (extracting performance) + + The tes.zip archive created by maximum compression with the Zip test + was used as example archive. Contents (archive size was 347783 bytes): + 1028492 bytes uncompressed, 337235 bytes compressed, 67%, 85 files + + The extraction directory tm was not deleted between the individual runs, + thus this measurement checks the "overwrite all" time. + + | testing | extracting + =================================================================== + EMX | 1.98 | 6.43(8) + DJ2 | 2.09 | 11.85(39) + DJ1 | 2.09 | 7.46(9) + WAT | 2.42 | 7.10(27) + MSC | 4.94 | 9.57(31) + +Remarks: + +The executables compiled by me were generated with all "performance" +options enabled (ASM_CRC, and ASMV for Zip), and with full crypt support. +For DJ1 and DJ2, the GCC options were "-O2 -m486", for EMX "-O -m486". + +The Watcom UnZip was compiled with ASM_CRC code enabled as well, +but the Watcom Zip example was made without any optional assembler code! + + + +Discussion of the results: + +In overall performance, the EMX executables clearly win. +For UnZip, emx is by far the fastest program, and the Zip performance is +comparable to the 16-bit "reference". + +Whenever "real" work including I/O is requested, the DJGPP versions +lose badly because of poor I/O performance, this is the case especially +for the "newer" DJGPP v2 !!! +(I tried to tweak with the transfer buffer size, but without any success.) +An interesting result is that DJ v1 UnZip works remarkably better than +DJ v2 (in contrast to Zip, where both executables' performance is +approximately equal). + +The Watcom C programs show a clear performance deficit in the "computational +part" (Watcom C compiler produces code that is far from optimal), but +the extender (which is mostly responsible for the I/O throughput) seems +to be quite fast. + +The "natural" performance deficit of the 16-bit MSC code, which can be +clearly seen in the "testing task" comparison for UnZip, is (mostly, +for Zip more than) compensated by the better I/O throughput (due to the +"direct interface" between "C RTL" and "DOS services", without any mode +switching). + +But performance is only one aspect when choosing which compiler should +be used for official distribution: + +Sizes of the executables: + | Zip || UnZip + | standalone stub || standalone | stub +====================================================================== +EMX | 143,364 (1) | 94,212 || 159,748 (1) | 110,596 +DJ2 | 118,272 (2) | -- || 124,928 (2) | -- +DJ1 | 159,744 | 88,064 || 177,152 | 105,472 +WAT | 140,073 | -- || 116,231 | -- +MSC | 49,212 (3) | -- || 45,510 (3) | -- + +(1) does not run in "DPMI only" environment (Windows DOS box) +(2) requires externally supplied DPMI server +(3) compressed with LZexe 0.91 + +Caveats/Bugs/Problems of the different extenders: + +EMX: +- requires two different extenders to run in all DOS-compatible environments, + EMX for "raw/himem/vcpi" and RSX for "dpmi" (Windows). +- does not properly support time zones (no daylight savings time) + +DJv2: +- requires an external (freely available) DPMI extender when run on plain + DOS; this extender cannot (currently ??) be bound into the executable. + +DJv1: +- uses up large amount of "low" dos memory (below 1M) when spawning + another program, each instance of a DJv1 program requires its private + GO32 extender copy in low dos memory (may be problem for the zip + "-T" feature) + +Watcom/PMODE: +- extended memory is allocated statically (default: ALL available memory) + This means that a spawned program does not get any extended memory. + You can work around this problem by setting a hard limit on the amount + of extended memory available to the PMODE program, but this limit is + "hard" and restricts the allocatable memory for the program itself. + In detail: + The Watcom zip.exe as distributed did not allow the "zip -T" feature; + there was no extended memory left to spawn unzip. + I could work around this problem by applying PMSETUP to change the + amount of allocated extended memory to 2.0 MByte (I had 4MB free extended + memory on my test system). But, this limit cannot be enlarged at + runtime, when zip needs more memory to store "header info" while + zipping up a huge drive, and on a system with less free memory, this + method is not applicable, either. + +Summary: + +For Zip: +Use the 16-bit executable whenever possible (unless you need the +larger memory capabilities when zipping up a huge amount of files) + +As 32-bit executable, we may distribute Watcom C (after we have confirmed +that enabling ASMV and ASM_CRC give us some better computational +performance.) +The alternative for 32-bit remains DJGPP v1, which shows the least problems +(to my knowledge); v2 and EMX cannot be used because of their lack of +"universality". + +For UnZip: +Here, the Watcom C 32-bit executable is probably the best compromise, +but DJ v1 could be used as well. +And, after all, the 16-bit version does not lose badly when doing +"real" extraction! For the SFX stub, the 16-bit version remains first +choice because of its much smaller size! + +Best regards + +Christian Spieler diff --git a/proginfo/txtvsbin.txt b/proginfo/txtvsbin.txt new file mode 100644 index 0000000..6ba2805 --- /dev/null +++ b/proginfo/txtvsbin.txt @@ -0,0 +1,112 @@ +A Fast Method of Identifying Plain Text Files +============================================= + + +Introduction +------------ + +Given a file coming from an unknown source, it is generally impossible +to conclude automatically, and with 100% accuracy, whether that file is +a plain text file, without performing a heavy-duty semantic analysis on +the file contents. It is, however, possible to obtain a fairly high +degree of accuracy, by employing various simple heuristics. + +Previous versions of the zip tools were using a crude detection scheme, +originally used by PKWare in its PKZip programs: if more than 80% (4/5) +of the bytes are within the range [7..127], the file is labeled as plain +text, otherwise it is labeled as binary. A prominent limitation of this +scheme is the restriction to Latin-based alphabets. Other alphabets, +like Greek, Cyrillic or Asian, make extensive use of the bytes within +the range [128..255], and texts using these alphabets are most often +mis-identified by this scheme; in other words, the rate of false +negatives is sometimes too high, which means that the recall is low. +Another weakness of this scheme is a reduced precision, due to the false +positives that may occur when binary files containing a large amount of +textual characters are mis-identified as plain text. + +In this article we propose a new detection scheme, with a much increased +accuracy and precision, and a near-100% recall. This scheme is designed +to work on ASCII and ASCII-derived alphabets, and it handles single-byte +alphabets (ISO-8859, OEM, KOI-8, etc.), and variable-sized alphabets +(DBCS, UTF-8, etc.). However, it cannot handle fixed-sized, multi-byte +alphabets (UCS-2, UCS-4), nor UTF-16. The principle used by this scheme +can easily be adapted to non-ASCII alphabets like EBCDIC. + + +The Algorithm +------------- + +The algorithm works by dividing the set of bytes [0..255] into three +categories: +- The white list of textual bytecodes: + 9 (TAB), 10 (LF), 13 (CR), 20 (SPACE) to 255 +- The gray list of tolerated bytecodes: + 7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), 27 (ESC) +- The black list of undesired, non-textual bytecodes: + 0 (NUL) to 6, 14 to 31. + +If a file contains at least one byte that belongs to the white list, and +no byte that belongs to the black list, then the file is categorized as +plain text. Otherwise, it is categorized as binary. + + +Rationale +--------- + +The idea behind this algorithm relies on two observations. + +The first observation is that, although the full range of 7-bit codes +(0..127) is properly specified by the ASCII standard, most control +characters in the range 0..31 are not used in practice. The only +widely-used, almost universally-portable control codes are 9 (TAB), +10 (LF), and 13 (CR). There are a few more control codes that are +recognized on a reduced range of platforms and text viewers/editors: +7 (BEL), 8 (BS), 11 (VT), 12 (FF), 26 (SUB), and 27 (ESC); but these +codes are rarely (if ever) used alone, without being accompanied by +some printable text. Even the newer, portable text formats, such as +XML, avoid using control characters outside the list mentioned here. + +The second observation is that most of the binary files tend to contain +control characters, especially 0 (NUL); even though the older text +detection schemes observe the presence of non-ASCII codes from the range +[128..255], the precision rarely has to suffer if this upper range is +labeled as textual, because the files that are genuinely binary tend to +contain both control characters, and codes from the upper range. On the +other hand, the upper range needs to be labeled as textual, because it +is being used by virtually all ASCII extensions. In particular, this +range is being heavily used to encode non-Latin scripts. + +Given the two observations, the plain text detection algorithm becomes +straightforward. There must be at least some printable material, or +some portable whitespace such as TAB, CR or LF, otherwise the file is +not labeled as plain text. (The boundary case, when the file is empty, +automatically falls into this category.) However, there must be no +non-portable control characters, otherwise it's very likely that the +intended reader of that file is a machine, rather than a human. + +Since there is no counting involved, other than simply observing the +presence or the absence of some byte values, the algorithm produces +uniform results on any particular text file, no matter what alphabet +encoding is being used for that text. (In contrast, if counting were +involved, it could be possible to obtain different results on a text +encoded, say, using ISO-8859-2 versus UTF-8.) There is the category +of plain text files that are "polluted" with one or a few black-listed +codes, either by mistake, or by peculiar design considerations. In such +cases, a scheme that tolerates a small percentage of black-listed codes +would provide an increased recall (i.e. more true positives). This, +however, incurs a reduced precision, since false positives are also more +likely to appear in binary files that contain large chunks of textual +data. "Polluted" plain text may, in fact, be regarded as binary, on +which text conversions should not be performed. Under this premise, it +is safe to say that the detection method provides a near-100% recall. + +Experiments have been run on a large set of files of various categories, +including plain old texts, system logs, source code, formatted office +documents, compiled object code, etcetera. The results confirm the +optimistic assumptions about the high accuracy, precision and recall +offered by this algorithm. + + +-- +Cosmin Truta +Last updated: 2005-Feb-27 diff --git a/proginfo/ziplimit.txt b/proginfo/ziplimit.txt new file mode 100644 index 0000000..e72d917 --- /dev/null +++ b/proginfo/ziplimit.txt @@ -0,0 +1,218 @@ +ziplimit.txt + +A) Hard limits of the Zip archive format: + + Number of entries in Zip archive: 64 k (2^16 - 1 entries) + Compressed size of archive entry: 4 GByte (2^32 - 1 Bytes) + Uncompressed size of entry: 4 GByte (2^32 - 1 Bytes) + Size of single-volume Zip archive: 4 GByte (2^32 - 1 Bytes) + Per-volume size of multi-volume archives: 4 GByte (2^32 - 1 Bytes) + Number of parts for multi-volume archives: 64 k (1^16 - 1 parts) + Total size of multi-volume archive: 256 TByte (4G * 64k) + + The number of archive entries and of multivolume parts are limited by + the structure of the "end-of-central-directory" record, where the these + numbers are stored in 2-Byte fields. + Some Zip and/or UnZip implementations (for example Info-ZIP's) allow + handling of archives with more than 64k entries. (The information + from "number of entries" field in the "end-of-central-directory" record + is not really neccessary to retrieve the contents of a Zip archive; + it should rather be used for consistency checks.) + + Length of an archive entry name: 64 kByte (2^16 - 1) + Length of archive member comment: 64 kByte (2^16 - 1) + Total length of "extra field": 64 kByte (2^16 - 1) + Length of a single e.f. block: 64 kByte (2^16 - 1) + Length of archive comment: 64 KByte (2^16 - 1) + + Additional limitation claimed by PKWARE: + Size of local-header structure (fixed fields of 30 Bytes + filename + local extra field): < 64 kByte + Size of central-directory structure (46 Bytes + filename + + central extra field + member comment): < 64 kByte + + Note: + In 2001, PKWARE has published version 4.5 of the Zip format specification + (together with the release of PKZIP for Windows 4.5). This specification + defines new extra field blocks that allow to break the size limits of the + standard zipfile structures. In this extended "Zip64" format, the limits + on the size of zip entries and the size of the complete zip archive are + extended to (2^64 - 1) Bytes; the maximum number of archive entries and + split volumes are enlarged to (2^64 - 1) respective (2^32 - 1). + Currently, these extensions are not yet supported by the released Info-ZIP + software. However, new major releases (Zip 3.0 and UnZip 6.0) are under + development and will support Zip64 archives on selected environments. + (Beta releases are already available for Unix, VMS and Win32.) + +B) Implementation limits of UnZip: + + 1. Size limits caused by file I/O and decompression handling: + Size of Zip archive: 2 GByte (2^31 - 1 Bytes) + Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes) + + Note: On some systems, UnZip may support archive sizes up to 4 GByte. + To get this support, the target environment has to meet the following + requirements: + a) The compiler's intrinsic "long" data types must be able to hold + integer numbers of 2^32. In other words - the standard intrinsic + integer types "long" and "unsigned long" have to be wider than + 32 bit. + b) The system has to supply a C runtime library that is compatible + with the more-than-32-bit-wide "long int" type of condition a) + c) The standard file positioning functions fseek(), ftell() (and/or + the Unix style lseek() and tell() functions) have to be capable + to move to absolute file offsets of up to 4 GByte from the file + start. + On 32-bit CPU hardware, you generally cannot expect that a C compiler + provides a "long int" type that is wider than 32-bit. So, many of the + most popular systems (i386, PowerPC, 680x0, et. al) are out of luck. + You may find environment that provide all requirements on systems + with 64-bit CPU hardware. Examples might be Cray number crunchers + or Compaq (former DEC) Alpha AXP machines. + + The number of Zip archive entries is unlimited. The "number-of-entries" + field of the "end-of-central-dir" record is checked against the "number + of entries found in the central directory" modulus 64k (2^16). + + Multi-volume archive extraction is not supported. + + Memory requirements are mostly independent of the archive size + and archive contents. + In general, UnZip needs a fixed amount of internal buffer space + plus the size to hold the complete information of the currently + processed entry's local header. Here, a large extra field + (could be up to 64 kByte) may exceed the available memory + for MSDOS 16-bit executables (when they were compiled in small + or medium memory model, with a fixed 64kByte limit on data space). + + The other exception where memory requirements scale with "larger" + archives is the "restore directory attributes" feature. Here, the + directory attributes info for each restored directory has to be held + in memory until the whole archive has been processed. So, the amount + of memory needed to keep this info scales with the number of restored + directories and may cause memory problems when a lot of directories + are restored in a single run. + +C) Implementation limits of the Zip executables: + + 1. Size limits caused by file I/O and compression handling: + Size of Zip archive: 2 GByte (2^31 - 1 Bytes) + Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes) + Uncompressed size of entry: 2 GByte (2^31 - 1 Bytes), + (could/should be 4 GBytes...) + Multi-volume archive creation is not supported. + + 2. Limits caused by handling of archive contents lists + + 2.1. Number of archive entries (freshen, update, delete) + a) 16-bit executable: 64k (2^16 -1) or 32k (2^15 - 1), + (unsigned vs. signed type of size_t) + a1) 16-bit executable: <16k ((2^16)/4) + (The smaller limit a1) results from the array size limit of + the "qsort()" function.) + 32-bit executables <1G ((2^32)/4) + (usual system limit of the "qsort()" function on 32-bit systems) + + b) stack space needed by qsort to sort list of archive entries + + NOTE: In the current executables, overflows of limits a) and b) are NOT + checked! + + c) amount of free memory to hold "central directory information" of + all archive entries; one entry needs: + 96 bytes (32-bit) resp. 80 bytes (16-bit) + + 3 * length of entry name + + length of zip entry comment (when present) + + length of extra field(s) (when present, e.g.: UT needs 9 bytes) + + some bytes for book-keeping of memory allocation + + Conclusion: + For systems with limited memory space (MSDOS, small AMIGAs, other + environments without virtual memory), the number of archive entries + is most often limited by condition c). + For example, with approx. 100 kBytes of free memory after loading and + initializing the program, a 16-bit DOS Zip cannot process more than 600 + to 1000 (+) archive entries. (For the 16-bit Windows DLL or the 16-bit + OS/2 port, limit c) is less important because Windows or OS/2 executables + are not restricted to the 1024k area of real mode memory. These 16-bit + ports are limited by conditions a1) and b), say: at maximum approx. + 16000 entries!) + + + 2.2. Number of "new" entries (add operation) + In addition to the restrictions above (2.1.), the following limits + caused by the handling of the "new files" list apply: + + a) 16-bit executable: <16k ((2^64)/4) + + b) stack size required for "qsort" operation on "new entries" list. + + NOTE: In the current executables, the overflow checks for these limits + are missing! + + c) amount of free memory to hold the directory info list for new entries; + one entry needs: + 24 bytes (32-bit) resp. 22 bytes (16-bit) + + 3 * length of filename + +D) Some technical remarks: + + 1. The 2GByte size limit on archive files is a consequence of the portable + C implementation of the Info-ZIP programs. + Zip archive processing requires random access to the archive file for + jumping between different parts of the archive's structure. + In standard C, this is done via stdio functions fseek()/ftell() resp. + unix-io functions lseek()/tell(). In many (most?) C implementations, + these functions use "signed long" variables to hold offset pointers + into sequential files. In most cases, this is a signed 32-bit number, + which is limited to ca. 2E+09. There may be specific C runtime library + implementations that interpret the offset numbers as unsigned, but for + us, this is not reliable in the context of portable programming. + + 2. The 2GByte limit on the size of a single compressed archive member + is again a consequence of the implementation in C. + The variables used internally to count the size of the compressed + data stream are of type "long", which is guaranted to be at least + 32-bit wide on all supported environments. + + But, why do we use "signed" long and not "unsigned long"? + + Throughout the I/O handling of the compressed data stream, the + sign bit of the "long" numbers is (mis-)used as a kind of overflow + detection. In the end, this is caused by the fact that standard C + lacks any overflow checking on integer arithmetics and does not + support access to the underlying hardware's overflow detection + (the status bits, especially "carry" and "overflow" of the CPU's + flags-register) in a system-independent manner. + + So, we "misuse" the most-significant bit of the compressed data + size counters as carry bit for efficient overflow/underflow detection. + We could change the code to a different method of overflow detection, + by using a bunch of "sanity" comparisons (kind of "is the calculated + result plausible when compared with the operands"). But, this would + "blow up" the code of the "inner loop", with remarkable loss of + processing speed. Or, we could reduce the amount of consistency checks + of the compressed data (e.g. detection of premature end of stream) to + an absolute minimum, at the cost of the programs' stability when + processing corrupted data. + + Summary: Changing the compression/decompression core routines to + be "unsigned safe" would require excessive recoding, with little + gain on maximum processable uncompressed size (a gain can only be + expected for hardly compressable data), but at severe costs on + performance, stability and maintainability. Therefore, it is + quite unlikely that this will ever happen for Zip/UnZip. + + The argumentation above is somewhat out-dated. The new releases + Zip 3 and UnZip 6 will support archive sizes larger than 4GB on + systems where the required underlying support for 64-bit file offsets + and file sizes is available from the OS (and the C runtime environment). + However, this new support will partially break compatibility with + older "legacy" systems. And it should be expected that the portability + and readability of the UnZip and Zip code may be reduced due to the + extensive use of non-standard language extension needed for 64-bit + support on the major target systems. + +Please report any problems to: Zip-Bugs at www.info-zip.org + +Last updated: 22 February 2005, Christian Spieler |