diff options
author | jbj <devnull@localhost> | 2002-09-30 21:45:34 +0000 |
---|---|---|
committer | jbj <devnull@localhost> | 2002-09-30 21:45:34 +0000 |
commit | 9fd8525a8b53609578216eec887c8876f36da236 (patch) | |
tree | ee3caed5ac4855d7c37e7a6bd74f863db81c502a /file/file.man | |
parent | 6d8b923dfb4182d49c74bae474fb1ff303a4bbd6 (diff) | |
download | librpm-tizen-9fd8525a8b53609578216eec887c8876f36da236.tar.gz librpm-tizen-9fd8525a8b53609578216eec887c8876f36da236.tar.bz2 librpm-tizen-9fd8525a8b53609578216eec887c8876f36da236.zip |
Initial revision
CVS patchset: 5732
CVS date: 2002/09/30 21:45:34
Diffstat (limited to 'file/file.man')
-rw-r--r-- | file/file.man | 448 |
1 files changed, 448 insertions, 0 deletions
diff --git a/file/file.man b/file/file.man new file mode 100644 index 000000000..dd510b475 --- /dev/null +++ b/file/file.man @@ -0,0 +1,448 @@ +.TH FILE __CSECTION__ "Copyright but distributable" +.\" Id: file.man,v 1.39 2001/04/27 22:48:33 christos Exp +.SH NAME +file +\- determine file type +.SH SYNOPSIS +.B file +[ +.B \-bciknsvzL +] +[ +.B \-f +.I namefile +] +[ +.B \-m +.I magicfiles +] +.I file +\&... +.br +.B file +.B -C +[ +.B \-m +magicfile ] +.SH DESCRIPTION +This manual page documents version __VERSION__ of the +.B file +command. +.PP +.B File +tests each argument in an attempt to classify it. +There are three sets of tests, performed in this order: +filesystem tests, magic number tests, and language tests. +The +.I first +test that succeeds causes the file type to be printed. +.PP +The type printed will usually contain one of the words +.B text +(the file contains only +printing characters and a few common control +characters and is probably safe to read on an +.SM ASCII +terminal), +.B executable +(the file contains the result of compiling a program +in a form understandable to some \s-1UNIX\s0 kernel or another), +or +.B data +meaning anything else (data is usually `binary' or non-printable). +Exceptions are well-known file formats (core files, tar archives) +that are known to contain binary data. +When modifying the file +.I __MAGIC__ +or the program itself, +.B "preserve these keywords" . +People depend on knowing that all the readable files in a directory +have the word ``text'' printed. +Don't do as Berkeley did and change ``shell commands text'' +to ``shell script''. +Note that the file +.I __MAGIC__ +is built mechanically from a large number of small files in +the subdirectory +.I Magdir +in the source distribution of this program. +.PP +The filesystem tests are based on examining the return from a +.BR stat (2) +system call. +The program checks to see if the file is empty, +or if it's some sort of special file. +Any known file types appropriate to the system you are running on +(sockets, symbolic links, or named pipes (FIFOs) on those systems that +implement them) +are intuited if they are defined in +the system header file +.IR <sys/stat.h> . +.PP +The magic number tests are used to check for files with data in +particular fixed formats. +The canonical example of this is a binary executable (compiled program) +.I a.out +file, whose format is defined in +.I a.out.h +and possibly +.I exec.h +in the standard include directory. +These files have a `magic number' stored in a particular place +near the beginning of the file that tells the \s-1UNIX\s0 operating system +that the file is a binary executable, and which of several types thereof. +The concept of `magic number' has been applied by extension to data files. +Any file with some invariant identifier at a small fixed +offset into the file can usually be described in this way. +The information identifying these files is read from the compiled +magic file +.I __MAGIC__.mgc , +or +.I __MAGIC__ +if the compile file does not exist. +.PP +If a file does not match any of the entries in the magic file, +it is examined to see if it seems to be a text file. +ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets +(such as those used on Macintosh and IBM PC systems), +UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC +character sets can be distinguished by the different +ranges and sequences of bytes that constitute printable text +in each set. +If a file passes any of these tests, its character set is reported. +ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified +as ``text'' because they will be mostly readable on nearly any terminal; +UTF-16 and EBCDIC are only ``character data'' because, while +they contain text, it is text that will require translation +before it can be read. +In addition, +.B file +will attempt to determine other characteristics of text-type files. +If the lines of a file are terminated by CR, CRLF, or NEL, instead +of the Unix-standard LF, this will be reported. +Files that contain embedded escape sequences or overstriking +will also be identified. +.PP +Once +.B file +has determined the character set used in a text-type file, +it will +attempt to determine in what language the file is written. +The language tests look for particular strings (cf +.IR names.h ) +that can appear anywhere in the first few blocks of a file. +For example, the keyword +.B .br +indicates that the file is most likely a +.BR troff (1) +input file, just as the keyword +.B struct +indicates a C program. +These tests are less reliable than the previous +two groups, so they are performed last. +The language test routines also test for some miscellany +(such as +.BR tar (1) +archives). +.PP +Any file that cannot be identified as having been written +in any of the character sets listed above is simply said to be ``data''. +.SH OPTIONS +.TP 8 +.B \-b +Do not prepend filenames to output lines (brief mode). +.TP 8 +.B \-c +Cause a checking printout of the parsed form of the magic file. +This is usually used in conjunction with +.B \-m +to debug a new magic file before installing it. +.TP 8 +.B \-C +Write a magic.mgc output file that contains a pre-parsed version of +file. +.TP 8 +.BI \-f " namefile" +Read the names of the files to be examined from +.I namefile +(one per line) +before the argument list. +Either +.I namefile +or at least one filename argument must be present; +to test the standard input, use ``\-'' as a filename argument. +.TP 8 +.B \-i +Causes the file command to output mime type strings rather than the more +traditional human readable ones. Thus it may say +``text/plain; charset=us-ascii'' +rather +than ``ASCII text''. In order for this option to work, file changes the way +it handles files recognised by the command itself (such as many of the +text file types, directories etc), and makes use of an alternative +``magic'' file. +(See ``FILES'' section, below). +.TP 8 +.B \-k +Don't stop at the first match, keep going. +.TP 8 +.BI \-m " list" +Specify an alternate list of files containing magic numbers. +This can be a single file, or a colon-separated list of files. +.TP 8 +.B \-n +Force stdout to be flushed after checking each file. This is only useful if +checking a list of files. It is intended to be used by programs that want +filetype output from a pipe. +.TP 8 +.B \-v +Print the version of the program and exit. +.TP 8 +.B \-z +Try to look inside compressed files. +.TP 8 +.B \-L +option causes symlinks to be followed, as the like-named option in +.BR ls (1). +(on systems that support symbolic links). +.TP 8 +.B \-s +Normally, +.B file +only attempts to read and determine the type of argument files which +.BR stat (2) +reports are ordinary files. +This prevents problems, because reading special files may have peculiar +consequences. +Specifying the +.BR \-s +option causes +.B file +to also read argument files which are block or character special files. +This is useful for determining the filesystem types of the data in raw +disk partitions, which are block special files. +This option also causes +.B file +to disregard the file size as reported by +.BR stat (2) +since on some systems it reports a zero size for raw disk partitions. +.SH FILES +.I __MAGIC__.mgc +\- defaults compiled list of magic numbers +.PP +.I __MAGIC__ +\- default list of magic numbers +.PP +.I __MAGIC__.mime +\- default list of magic numbers, used to output mime types when the -i option +is specified. + +.SH ENVIRONMENT +The environment variable +.B MAGIC +can be used to set the default magic number files. +.SH SEE ALSO +.BR magic (__FSECTION__) +\- description of magic file format. +.br +.BR strings (1), " od" (1), " hexdump(1)" +\- tools for examining non-textfiles. +.SH STANDARDS CONFORMANCE +This program is believed to exceed the System V Interface Definition +of FILE(CMD), as near as one can determine from the vague language +contained therein. +Its behaviour is mostly compatible with the System V program of the same name. +This version knows more magic, however, so it will produce +different (albeit more accurate) output in many cases. +.PP +The one significant difference +between this version and System V +is that this version treats any white space +as a delimiter, so that spaces in pattern strings must be escaped. +For example, +.br +>10 string language impress\ (imPRESS data) +.br +in an existing magic file would have to be changed to +.br +>10 string language\e impress (imPRESS data) +.br +In addition, in this version, if a pattern string contains a backslash, +it must be escaped. For example +.br +0 string \ebegindata Andrew Toolkit document +.br +in an existing magic file would have to be changed to +.br +0 string \e\ebegindata Andrew Toolkit document +.br +.PP +SunOS releases 3.2 and later from Sun Microsystems include a +.BR file (1) +command derived from the System V one, but with some extensions. +My version differs from Sun's only in minor ways. +It includes the extension of the `&' operator, used as, +for example, +.br +>16 long&0x7fffffff >0 not stripped +.SH MAGIC DIRECTORY +The magic file entries have been collected from various sources, +mainly USENET, and contributed by various authors. +Christos Zoulas (address below) will collect additional +or corrected magic file entries. +A consolidation of magic file entries +will be distributed periodically. +.PP +The order of entries in the magic file is significant. +Depending on what system you are using, the order that +they are put together may be incorrect. +If your old +.B file +command uses a magic file, +keep the old magic file around for comparison purposes +(rename it to +.IR __MAGIC__.orig ). +.SH EXAMPLES +.nf +$ file file.c file /dev/hda +file.c: C program text +file: ELF 32-bit LSB executable, Intel 80386, version 1, + dynamically linked, not stripped +/dev/hda: block special + +$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} +/dev/hda: x86 boot sector +/dev/hda1: Linux/i386 ext2 filesystem +/dev/hda2: x86 boot sector +/dev/hda3: x86 boot sector, extended partition table +/dev/hda4: Linux/i386 ext2 filesystem +/dev/hda5: Linux/i386 swap file +/dev/hda6: Linux/i386 swap file +/dev/hda7: Linux/i386 swap file +/dev/hda8: Linux/i386 swap file +/dev/hda9: empty +/dev/hda10: empty + +$ file -i file.c file /dev/hda +file.c: text/x-c +file: application/x-executable, dynamically linked (uses shared libs), not stripped +/dev/hda: application/x-not-regular-file + +.fi +.SH HISTORY +There has been a +.B file +command in every \s-1UNIX\s0 since at least Research Version 6 +(man page dated January 16, 1975). +The System V version introduced one significant major change: +the external list of magic number types. +This slowed the program down slightly but made it a lot more flexible. +.PP +This program, based on the System V version, +was written by Ian Darwin <ian@darwinsys.com> +without looking at anybody else's source code. +.PP +John Gilmore revised the code extensively, making it better than +the first version. +Geoff Collyer found several inadequacies +and provided some magic file entries. +Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. +.PP +Guy Harris, guy@netapp.com, made many changes from 1993 to the present. +.PP +Primary development and maintenance from 1990 to the present by +Christos Zoulas (christos@astron.com). +.PP +Altered by Chris Lowth, chris@lowth.com, 2000: +Handle the ``-i'' option to output mime type strings and using an alternative +magic file and internal logic. +.PP +Altered by Eric Fischer (enf@pobox.com), July, 2000, +to identify character codes and attempt to identify the languages +of non-ASCII files. +.PP +The list of contributors to the "Magdir" directory (source for the +/etc/magic +file) is too long to include here. You know who you are; thank you. +.SH LEGAL NOTICE +Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. +Covered by the standard Berkeley Software Distribution copyright; see the file +LEGAL.NOTICE in the source distribution. +.PP +The files +.I tar.h +and +.I is_tar.c +were written by John Gilmore from his public-domain +.B tar +program, and are not covered by the above license. +.SH BUGS +There must be a better way to automate the construction of the Magic +file from all the glop in Magdir. What is it? +Better yet, the magic file should be compiled into binary (say, +.BR ndbm (3) +or, better yet, fixed-length +.SM ASCII +strings for use in heterogenous network environments) for faster startup. +Then the program would run as fast as the Version 7 program of the same name, +with the flexibility of the System V version. +.PP +.B File +uses several algorithms that favor speed over accuracy, +thus it can be misled about the contents of +text +files. +.PP +The support for +text +files (primarily for programming languages) +is simplistic, inefficient and requires recompilation to update. +.PP +There should be an ``else'' clause to follow a series of continuation lines. +.PP +The magic file and keywords should have regular expression support. +Their use of +.SM "ASCII TAB" +as a field delimiter is ugly and makes +it hard to edit the files, but is entrenched. +.PP +It might be advisable to allow upper-case letters in keywords +for e.g., +.BR troff (1) +commands vs man page macros. +Regular expression support would make this easy. +.PP +The program doesn't grok \s-2FORTRAN\s0. +It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which +appear indented at the start of line. +Regular expression support would make this easy. +.PP +The list of keywords in +.I ascmagic +probably belongs in the Magic file. +This could be done by using some keyword like `*' for the offset value. +.PP +Another optimisation would be to sort +the magic file so that we can just run down all the +tests for the first byte, first word, first long, etc, once we +have fetched it. Complain about conflicts in the magic file entries. +Make a rule that the magic entries sort based on file offset rather +than position within the magic file? +.PP +The program should provide a way to give an estimate +of ``how good'' a guess is. +We end up removing guesses (e.g. ``From '' as first 5 chars of file) because +they are not as good as other guesses (e.g. ``Newsgroups:'' versus +``Return-Path:''). Still, if the others don't pan out, it should be +possible to use the first guess. +.PP +This program is slower than some vendors' file commands. +The new support for multiple character codes makes it even slower. +.PP +This manual page, and particularly this section, is too long. +.SH AVAILABILITY +You can obtain the original author's latest version by anonymous FTP +on +.B ftp.astron.com +in the directory +.I /pub/file/file-X.YY.tar.gz |