summaryrefslogtreecommitdiff
path: root/shared-mime-info-spec.xml
diff options
context:
space:
mode:
Diffstat (limited to 'shared-mime-info-spec.xml')
-rw-r--r--shared-mime-info-spec.xml1110
1 files changed, 1110 insertions, 0 deletions
diff --git a/shared-mime-info-spec.xml b/shared-mime-info-spec.xml
new file mode 100644
index 0000000..b156295
--- /dev/null
+++ b/shared-mime-info-spec.xml
@@ -0,0 +1,1110 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
+ <!ENTITY updated "8 October 2010">
+ <!ENTITY version "0.20">
+]>
+<article id="index">
+
+<articleinfo>
+ <authorgroup>
+ <corpauthor>
+ <ulink url="http://www.freedesktop.org">
+ X Desktop Group
+ </ulink>
+ </corpauthor>
+ <author>
+ <firstname>Thomas</firstname>
+ <surname>Leonard</surname>
+ <affiliation>
+ <address><email>tal197 at users.sf.net</email></address>
+ </affiliation>
+ </author>
+ </authorgroup>
+
+ <title>Shared MIME-info Database</title>
+ <date>&updated;</date>
+</articleinfo>
+
+<sect1>
+ <title>Introduction</title>
+ <sect2>
+ <title>Version</title>
+ <para>
+This is version &version; of the Shared MIME-info Database specification, last updated &updated;.</para>
+ </sect2>
+ <sect2>
+ <title>What is this spec?</title>
+ <para>
+Many programs and desktops use the MIME system<citation>MIME</citation>
+to represent the types of files. Frequently, it is necessary to work out the
+correct MIME type for a file. This is generally done by examining the file's
+name or contents, and looking up the correct MIME type in a database.
+ </para>
+ <para>
+It is also useful to store information about each type, such as a textual
+description of it, or a list of applications that can be used to view or edit
+files of that type.
+ </para>
+ <para>
+For interoperability, it is useful for different programs to use the same
+database so that different programs agree on the type of a file and
+information is not duplicated. It is also helpful for application authors to
+only have to install new information in one place.
+ </para>
+ <para>
+This specification attempts to unify the MIME database systems currently in
+use by GNOME<citation>GNOME</citation>, KDE<citation>KDE</citation> and
+ROX<citation>ROX</citation>, and provide room for future extensibility.
+ </para>
+ <para>
+The MIME database does NOT store user preferences (such as a user's preferred
+application for handling files of a particular type). It may be used to store
+static information, such as that files of a certain type may be viewed with
+a particular application.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Language used in this specification</title>
+ <para>
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
+"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
+interpreted as described in RFC 2119<citation>RFC-2119</citation>.
+ </para>
+ </sect2>
+</sect1>
+
+<sect1>
+ <title>Unified system</title>
+ <para>
+In discussions about the previous systems used by GNOME, KDE and ROX (see the
+"History and related systems" document), it was clear that the differences
+between the databases were simply a result of them being separate, and not due
+to any fundamental disagreements between developers. Everyone is keen to see
+them merged.
+ </para>
+ <para>
+This specification proposes:
+
+ <itemizedlist>
+ <listitem><para>
+A standard way for applications to install new MIME related information.
+ </para></listitem>
+ <listitem><para>
+A standard way of getting the MIME type for a file.
+ </para></listitem>
+ <listitem><para>
+A standard way of getting information about a MIME type.
+ </para></listitem>
+ <listitem><para>
+Standard locations for all the files, and methods of resolving conflicts.
+ </para></listitem>
+ </itemizedlist>
+Further, the existing databases have been merged into a single package
+<citation>SharedMIME</citation>.
+ </para>
+ <sect2 id="s2_layout">
+ <title>Directory layout</title>
+ <para>
+There are two important requirements for the way the MIME database is stored:
+ <itemizedlist>
+ <listitem><para>
+Applications must be able to extend the database in any way when they are installed,
+to add both new rules for determining type, and new information about specific types.
+ </para></listitem>
+ <listitem><para>
+It must be possible to install applications in /usr, /usr/local and the user's home directory
+(in the normal Unix way) and have the MIME information used.
+ </para></listitem>
+ </itemizedlist>
+ </para>
+ <para>
+This specification uses the XDG Base Directory Specification<citation>BaseDir</citation> to
+define the prefixes below which the database is stored.
+In the rest of this document, paths shown with the prefix
+<filename>&lt;MIME&gt;</filename> indicate the files should be loaded from
+the <filename>mime</filename> subdirectory of every directory in
+<envar>XDG_DATA_HOME</envar>:<envar>XDG_DATA_DIRS</envar>.
+ </para>
+ <para>
+For example, when using the default paths, <quote>Load all the
+<filename>&lt;MIME&gt;/text/html.xml</filename> files</quote> means to
+load <filename>/usr/share/mime/text/html.xml</filename>,
+<filename>/usr/local/share/mime/text/html.xml</filename>, and
+<filename>~/.local/share/mime/text/html.xml</filename> (if they exist, and in this order).
+Information found in a directory is added to the information found in previous
+directories, except when <userinput>glob-deleteall</userinput> or <userinput>magic-deleteall</userinput>
+is used to overwrite parts of a mimetype definition.
+ </para>
+ <para>
+Each application that wishes to contribute to the MIME database will install a
+single XML file, named after the application, into one of the three
+<filename>&lt;MIME&gt;/packages/</filename> directories (depending on where the user requested
+the application be installed). After installing, uninstalling or modifying this
+file, the application MUST run the <command>update-mime-database</command> command,
+which is provided by the freedesktop.org shared database<citation>SharedMIME</citation>.
+ </para>
+ <para>
+<command>update-mime-database</command> is passed the <filename>mime</filename>
+directory containing the <filename>packages</filename> subdirectory which was
+modified as its only argument. It scans all the XML files in the <filename>packages</filename>
+subdirectory, combines the information in them, and creates a number of output files.
+ </para>
+ <para>
+Where the information from these files is conflicting, information from directories
+lower in the list takes precedence.
+Any file named <filename>Override.xml</filename> takes precedence over all other files in
+the same <filename>packages</filename> directory. This can be used by
+tools which let the user edit the database to ensure that the user's
+changes take effect.
+ </para>
+ <para>
+The files created by <command>update-mime-database</command> are:
+ <itemizedlist>
+ <listitem><para>
+<filename>&lt;MIME&gt;/globs</filename> (contains a mapping from names to MIME types) [deprecated for globs2]
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/globs2</filename> (contains a mapping from names to MIME types and glob weight)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/magic</filename> (contains a mapping from file contents to MIME types)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/subclasses</filename> (contains a mapping from MIME types to types they inherit from)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/aliases</filename> (contains a mapping from aliases to MIME types)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/icons</filename> (contains a mapping from MIME types to icons)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/generic-icons</filename> (contains a mapping from MIME types to generic icons)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/XMLnamespaces</filename> (contains a mapping from XML
+(namespaceURI, localName) pairs to MIME types)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/MEDIA/SUBTYPE.xml</filename> (one file for each MIME
+type, giving details about the type, including comment, icon and generic-icon)
+ </para></listitem>
+ <listitem><para>
+<filename>&lt;MIME&gt;/mime.cache</filename> (contains the same information as the <filename>globs2</filename>,
+<filename>magic</filename>, <filename>subclasses</filename>, <filename>aliases</filename>,
+<filename>icons</filename>, <filename>generic-icons</filename> and <filename>XMLnamespaces</filename> files,
+in a binary, mmappable format)
+ </para></listitem>
+ </itemizedlist>
+The format of these generated files and the source files in <filename>packages</filename>
+are explained in the following sections. This step serves several purposes. First, it allows
+applications to quickly get the data they need without parsing all the source XML files (the
+base package alone is over 700K). Second, it allows the database to be used for other
+purposes (such as creating the <filename>/etc/mime.types</filename> file if
+desired). Third, it allows validation to be performed on the input data,
+and removes the need for other applications to carefully check the input for
+errors themselves.
+ </para>
+ </sect2>
+ <sect2>
+ <title>The source XML files</title>
+ <para>
+Each application provides only a single XML source file, which is installed in the
+<filename>packages</filename> directory as described above. This file is an XML file
+whose document element is named <userinput>mime-info</userinput> and whose namespace URI
+is <ulink url="http://www.freedesktop.org/standards/shared-mime-info"/>. All elements
+described in this specification MUST have this namespace too.
+ </para><para>
+The document element may contain zero or more <userinput>mime-type</userinput> child nodes,
+in any order, each describing a single MIME type. Each element has a <userinput>type</userinput>
+attribute giving the MIME type that it describes.
+ </para><para>
+Each <userinput>mime-type</userinput> node may contain any combination of the following elements,
+and in any order:
+ <itemizedlist>
+ <listitem><para>
+<userinput>glob</userinput> elements have a <userinput>pattern</userinput> attribute. Any file
+whose name matches this pattern will be given this MIME type (subject to conflicting rules in
+other files, of course). There is also an optional <userinput>weight</userinput> attribute which
+is used when resolving conflicts with other glob matches. The default weight value is 50, and
+the maximum is 100.
+ </para>
+ <para>
+KDE's glob system replaces GNOME's and ROX's ext/regex fields, since it
+is trivial to detect a pattern in the form '*.ext' and store it in an
+extension hash table internally. The full power of regular expressions was
+not being used by either desktop, and glob patterns are more suitable for
+filename matching anyway.
+ </para></listitem>
+ <listitem><para>
+A <userinput>glob-deleteall</userinput> element, which indicates that patterns from
+previously parsed directories must be discarded. The patterns defined in this file
+(if any) are used instead.
+ </para></listitem>
+ <listitem><para>
+<userinput>magic</userinput> elements contain a list of
+<userinput>match</userinput> elements, any of which may match, and an optional
+<userinput>priority</userinput> attribute for all of the contained rules. Low
+numbers should be used for more generic types (such as 'gzip compressed data')
+and higher values for specific subtypes (such as a word processor format that
+happens to use gzip to compress the file). The default priority value is 50, and
+the maximum is 100.
+ </para><para>
+Each <userinput>match</userinput> element has a number of attributes:
+
+<informaltable>
+ <tgroup cols="3">
+ <thead><row><entry>Attribute</entry><entry>Required?</entry><entry>Value</entry></row></thead>
+ <tbody>
+
+ <row><entry>type</entry><entry>Yes</entry><entry>
+<userinput>string</userinput>, <userinput>host16</userinput>,
+<userinput>host32</userinput>, <userinput>big16</userinput>,
+<userinput>big32</userinput>, <userinput>little16</userinput>,
+<userinput>little32</userinput> or <userinput>byte</userinput>.
+ </entry></row>
+
+ <row><entry>offset</entry><entry>Yes</entry><entry>The byte offset(s)
+ in the file to check. This may be a single number or a range in the
+ form `start:end', indicating that all offsets in the range should be
+ checked. The range is inclusive.</entry></row>
+
+ <row><entry>value</entry><entry>Yes</entry><entry>
+ The value to compare the file contents with, in the format indicated by the type
+ attribute.
+ </entry></row>
+
+ <row><entry>mask</entry><entry>No</entry><entry>
+ The number to AND the value in the file with before comparing it to
+ `value'. Masks for numerical types can be any number, while masks for strings
+ must be in base 16, and start with 0x.
+ </entry></row>
+
+ </tbody></tgroup>
+</informaltable>
+
+Each element corresponds to one line of
+<citerefentry><refentrytitle>file</refentrytitle>
+<manvolnum>1</manvolnum></citerefentry>'s <filename>magic.mime</filename> file.
+They can be nested in the same way to provide the equivalent of continuation
+lines. That is, <![CDATA[<a><b/><c/></a>]]> means 'a and (b or c)'.
+ </para></listitem>
+ <listitem><para>
+A <userinput>magic-deleteall</userinput> element, which indicates that magic matches from
+previously parsed directories must be discarded. The magic defined in this file
+(if any) is used instead.
+ </para></listitem>
+ <listitem><para>
+<userinput>alias</userinput> elements indicate that the type is also sometimes
+known by another name, given by the <userinput>type</userinput> attribute. For
+example, <userinput>audio/midi</userinput> has an alias of
+<userinput>audio/x-midi</userinput>. Note that there should not be a
+<userinput>mime-type</userinput> element defining each alias; a single
+element defines the canonical name for the type and lists all its aliases.
+ </para></listitem>
+ <listitem><para>
+<userinput>sub-class-of</userinput> elements indicate that any data of this
+type is also some other type, given by the <userinput>type</userinput>
+attribute. See <xref linkend="subclassing"/>.
+ </para></listitem>
+ <listitem><para>
+<userinput>comment</userinput> elements give a human-readable textual description of the MIME
+type, usually composed of an acronym of the file name extension and a short description, like
+"ODS spreadsheet".
+There may be many of these elements with different <userinput>xml:lang</userinput> attributes
+to provide the text in multiple languages.
+ </para></listitem>
+ <listitem><para>
+<userinput>acronym</userinput> elements give experienced users a terse idea of the document contents.
+for example "ODS", "GEDCOM", "JPEG" and "XML".
+There may be many of these elements with different <userinput>xml:lang</userinput> attributes
+to provide the text in multiple languages, although these should only be used if absolutely neccessary.
+ </para></listitem>
+ <listitem><para>
+<userinput>expanded-acronym</userinput> elements are the expanded versions of the acronym elements,
+for example "OpenDocument Spreadsheet", "GEnealogical Data COMmunication", and "eXtensible Markup Language".
+The purpose of these elements is to provide users a way to look up information on various MIME types or
+file formats in third-party resources.
+There may be many of these elements with different <userinput>xml:lang</userinput> attributes
+to provide the text in multiple languages, although these should only be used if absolutely neccessary.
+ </para></listitem>
+ <listitem><para>
+<userinput>icon</userinput> elements specify the icon to be used for this particular mime-type, given
+by the <userinput>name</userinput> attribute. Generally the icon used for a mimetype is created
+based on the mime-type by mapping "/" characters to "-", but users can override this by using
+the <userinput>icon</userinput> element to customize the icon for a particular mimetype.
+This element is not used in the system database, but only used in the user overridden database.
+Only one <userinput>icon</userinput> element is allowed.
+ </para></listitem>
+ <listitem><para>
+<userinput>generic-icon</userinput> elements specify the icon to use as a generic icon for this
+particular mime-type, given by the <userinput>name</userinput> attribute. This is used if there
+is no specific icon (see <userinput>icon</userinput> for how these are found). These are
+used for categories of similar types (like spreadsheets or archives) that can use a common icon.
+The Icon Naming Specification lists a set of such icon names. If this element is not specified
+then the mimetype is used to generate the generic icon by using the top-level media type (e.g.
+"video" in "video/ogg") and appending "-x-generic" (i.e. "video-x-generic" in the previous example).
+Only one <userinput>generic-icon</userinput> element is allowed.
+ </para></listitem>
+ <listitem><para>
+<userinput>root-XML</userinput> elements have <userinput>namespaceURI</userinput>
+and <userinput>localName</userinput> attributes. If a file is identified as being an XML file,
+these rules allow a more specific MIME type to be chosen based on the namespace and localname
+of the document element.
+ </para><para>
+If <userinput>localName</userinput> is present but empty then the document element may have
+any name, but the namespace must still match.
+ </para></listitem>
+ <listitem><para>
+<userinput>treemagic</userinput> elements contain a list of <userinput>treematch</userinput> elements,
+any of which may match, and an optional <userinput>priority</userinput> attribute for all of the
+contained rules. The default priority value is 50, and the maximum is 100.
+ </para><para>
+Each <userinput>treematch</userinput> element has a number of attributes:
+
+<informaltable>
+ <tgroup cols="3">
+ <thead><row><entry>Attribute</entry><entry>Required?</entry><entry>Value</entry></row></thead>
+ <tbody>
+
+ <row><entry>path</entry><entry>Yes</entry><entry>A path that must be present on the mounted volume/filesystem. The path is interpreted as a relative path starting at the root of the tested volume/filesystem</entry></row>
+
+ <row><entry>type</entry><entry>No</entry><entry>The type of path. Possible values: <userinput>file</userinput>, <userinput>directory</userinput>, <userinput>link</userinput></entry></row>
+
+ <row><entry>match-case</entry><entry>No</entry><entry>Whether path should be matched case-sensitively. Possible values: <userinput>true</userinput>, <userinput>false</userinput></entry></row>
+
+ <row><entry>executable</entry><entry>No</entry><entry>Whether the file must be executable. Possible values: <userinput>true</userinput>, <userinput>false</userinput></entry></row>
+
+ <row><entry>non-empty</entry><entry>No</entry><entry>Whether the directory must be non-empty. Possible values: <userinput>true</userinput>, <userinput>false</userinput></entry></row>
+
+ <row><entry>mimetype</entry><entry>No</entry><entry>The mimetype for the file at path</entry></row>
+
+ </tbody></tgroup>
+</informaltable>
+
+<userinput>treematch</userinput> elements can be nested, meaning that both the outer and the inner <userinput>treematch</userinput>
+must be satisfied for a "match".
+ </para></listitem>
+ </itemizedlist>
+Applications may also define their own elements, provided they are namespaced to prevent collisions.
+Unknown elements are copied directly to the output XML files like <userinput>comment</userinput>
+elements. A typical use for this would be to indicate the default handler
+application for a particular desktop
+("Galeon is the GNOME default text/html browser"). Note that this doesn't
+indicate the user's preferred application, only the (fixed) default.
+ </para>
+ <para>
+Here is an example source file, named <filename>diff.xml</filename>:
+ <programlisting><![CDATA[
+<?xml version="1.0"?>
+<mime-info xmlns='http://www.freedesktop.org/standards/shared-mime-info'>
+ <mime-type type="text/x-diff">
+ <comment>Differences between files</comment>
+ <comment xml:lang="af">verskille tussen lĂȘers</comment>
+ ...
+ <magic priority="50">
+ <match type="string" offset="0" value="diff\t"/>
+ <match type="string" offset="0" value="***\t"/>
+ <match type="string" offset="0" value="Common subdirectories: "/>
+ </magic>
+ <glob pattern="*.diff"/>
+ <glob pattern="*.patch"/>
+ </mime-type>
+</mime-info>
+]]></programlisting>
+ </para><para>
+In practice, common types such as text/x-diff are provided by the freedesktop.org shared
+database. Also, only new information needs to be provided, since this information will be merged
+with other information about the same type.
+ </para>
+ </sect2>
+ <sect2>
+ <title>The MEDIA/SUBTYPE.xml files</title>
+ <para>
+These files have a <userinput>mime-type</userinput> element as the root node. The format is
+as described above. They are created by merging all the <userinput>mime-type</userinput>
+elements from the source files and creating one output file per MIME type. Each file may contain
+information from multiple source files. The <userinput>magic</userinput>,
+<userinput>glob</userinput> and <userinput>root-XML</userinput> elements will
+have been removed.
+ </para>
+ <para>
+The example source file given above would (on its own) create an output file called
+<filename>&lt;MIME&gt;/text/x-diff.xml</filename> containing the following:
+ <programlisting><![CDATA[
+<?xml version="1.0" encoding="utf-8"?>
+<mime-type xmlns="http://www.freedesktop.org/standards/shared-mime-info" type="text/x-diff">
+<!--Created automatically by update-mime-database. DO NOT EDIT!-->
+ <comment>Differences between files</comment>
+ <comment xml:lang="af">verskille tussen lĂȘers</comment>
+ ...
+</mime-type>
+
+]]></programlisting>
+ </para>
+ </sect2>
+ <sect2>
+ <title>The glob files</title>
+ <para>
+The globs2 file is a simple list of lines containing weight, MIME type and pattern, separated by a colon.
+The lines are ordered by glob weight.
+For example:
+ <programlisting><![CDATA[
+# This file was automatically generated by the
+# update-mime-database command. DO NOT EDIT!
+...
+55:text/x-diff:*.patch
+50:text/x-diff:*.diff
+50:text/x-c++src:*.C:cs
+...
+]]></programlisting>
+ </para>
+ <para>
+The glob file is a simple list of lines containing a MIME type and pattern, separated by a colon. It is
+deprecated in favour of the globs2 file which also lists the weight of the glob rule.
+The lines are ordered by glob weight.
+For example:
+ <programlisting><![CDATA[
+# This file was automatically generated by the
+# update-mime-database command. DO NOT EDIT!
+...
+text/x-diff:*.patch
+text/x-diff:*.diff
+...
+]]></programlisting>
+ </para>
+ <para>
+Applications MUST match globs case-insensitively, except when the case-sensitive attribute
+is set to true.
+This is so that e.g. <filename>main.C</filename> will be seen as a C++ file,
+but <filename>IMAGE.GIF</filename> will still use the *.gif pattern.
+ </para>
+ <para>
+If several patterns of the same weight match then the longest pattern SHOULD be used.
+In particular, files with multiple extensions (such as
+<filename>Data.tar.gz</filename>) MUST match the longest sequence of extensions
+(eg '*.tar.gz' in preference to '*.gz'). Literal patterns (eg, 'Makefile') must
+be matched before all others. It is suggested that patterns beginning with `*.'
+and containing no other special characters (`*?[') should be placed in a hash
+table for efficient lookup, since this covers the majority of the patterns. Thus,
+patterns of this form should be matched before other wildcarded patterns.
+ </para>
+ <para>
+If a matching pattern is provided by two or more MIME types, applications
+SHOULD not rely on one of them. They are instead supposed to use magic data
+(see below) to detect the actual MIME type. This is for instance required to
+deal with container formats like Ogg or AVI, that map various video and/or
+audio-encoded data to one extension.
+ </para>
+ <para>
+There may be several rules mapping to the same type. They should all be merged.
+If the same pattern is defined twice, then they MUST be ordered by the
+directory the rule came from, as described above.
+ </para>
+ <para>
+The <userinput>glob-deleteall</userinput> element, which means that implementations
+SHOULD discard information from previous directories, is written out into the globs2 file using
+__NOGLOBS__ as the pattern. For instance:
+ <programlisting><![CDATA[
+0:text/x-diff:__NOGLOBS__
+50:text/x-diff:*.diff
+...
+]]></programlisting>
+ </para>
+ <para>
+In the above example, the mimetype text/x-diff is redefined (for instance in a user's
+~/.local/share/mime) to only be associated with the pattern *.diff, so the other patterns
+like *.patch were removed. The weight in front of the __NOGLOBS__ line is ignored.
+In a given globs2 file, the __NOGLOBS__ line for a given mimetype is always written
+out before any other globs for this mimetype.
+ </para>
+ <para>
+Lines beginning with `#' are comments and should be ignored. Everything from
+the `:' character to the newline is part of the pattern; spaces should not be
+stripped. The file is in the UTF-8 encoding. The format of the glob pattern
+is as for fnmatch(3). The format does not allow a pattern to contain a literal
+newline character, but this is not expected to be a problem.
+ </para>
+ <para>
+Common types (such as MS Word Documents) will be provided in the X Desktop
+Group's package, which MUST be required by all applications using this
+specification. Since each application will then only be providing information
+about its own types, conflicts should be rare.
+ </para>
+ <para>
+The fourth field ("cs" in the first globs2 example) contains a list of comma-separated flags.
+The flags currently defined are: cs (for case-sensitive). Implementations should ignore
+unknown flags.
+ </para>
+ <para>
+Implementations should also ignore further fields, so that the syntax of the globs2 file
+can be extended in the future. Example: "50:text/x-c++src:*.C:cs,newflag:newfeature:somethingelse"
+should currently be parsed as "50:text/x-c++src:*.C:cs".
+ </para>
+ </sect2>
+ <sect2>
+ <title>The magic files</title>
+ <para>
+The magic data is stored in a binary format for ease of parsing. The old magic database
+had complex escaping rules; these are now handled by <command>update-mime-database</command>.
+ </para><para>
+The file starts with the magic string "MIME-Magic\0\n".
+There is no version number in the file. Incompatible changes will be handled by
+creating both the current `magic' file and a newer `magic2' in the new format.
+Where possible, compatible changes only will be made.
+All numbers are big-endian, so need to be byte-swapped on little-endian machines.
+ </para><para>
+The rest of the file is made up of a sequence of small sections.
+Each section is introduced by giving the priority and type in brackets, followed by
+a newline character. Higher priority entries come first. Example:
+<screen>[50:text/x-diff]\n</screen>
+Each line in the section takes the form:
+<screen>[ indent ] ">" start-offset "=" value
+[ "&amp;" mask ] [ "~" word-size ] [ "+" range-length ] "\n"</screen>
+<informaltable>
+ <tgroup cols="3">
+ <thead><row><entry>Part</entry><entry>Example</entry><entry>Meaning</entry></row></thead>
+ <tbody>
+
+ <row><entry>indent</entry><entry>1</entry><entry>The nesting
+ depth of the rule, corresponding to the number of '>' characters in the traditional file format.</entry></row>
+ <row><entry>">" start-offset</entry><entry>&gt;4</entry><entry>The offset into the
+ file to look for a match.</entry></row>
+ <row><entry>"=" value</entry><entry>=\0x0\0x2\0x55\0x40</entry><entry>
+ Two bytes giving the (big-endian) length of the value, followed by the value itself.
+ </entry></row>
+ <row><entry>"&amp;" mask</entry><entry>&amp;\0xff\0xf0</entry><entry>
+ The mask, which (if present) is exactly the same length as the value.
+ </entry></row>
+ <row><entry>"~" word-size</entry><entry>~2</entry><entry>On little-endian machines, the
+ size of each group to byte-swap.</entry></row>
+ <row><entry>"+" range-length</entry><entry>+8</entry><entry>The length of the region
+ in the file to check.
+ </entry></row>
+ </tbody>
+ </tgroup>
+</informaltable>
+ </para><para>
+Note that the value, value length and mask are all binary, whereas everything
+else is textual. Each of the elements begins with a single character to
+identify it, except for the indent level.
+ </para><para>
+The word size is used for byte-swapping. Little-endian systems should reverse
+the order of groups of bytes in the value and mask if this is greater than one.
+This only affects `host' matches (`big32' entries still have a word size of 1,
+for example, because no swapping is necessary, whereas `host32' has a word size
+of 4).
+ </para><para>
+The indent, range-length, word-size and mask components are optional. If
+missing, indent defaults to 0, range-length to 1, the word-size to 1, and the
+mask to all 'one' bits.
+ </para><para>
+Indent corresponds to the nesting depth of the rule. Top-level rules have an
+indent of zero. The parent of an entry is the preceding entry with an indent
+one less than the entry.
+ </para><para>
+If an unknown character is found where a newline is expected then the whole
+line should be ignored (there will be no binary data after the new
+character, so the next line starts after the next "\n" character). This is for
+future extensions.
+ </para><para>
+The text/x-diff above example would (on its own) create this magic file:
+ <programlisting><![CDATA[
+00000000 4d 49 4d 45 2d 4d 61 67 69 63 00 0a 5b 35 30 3a |MIME-Magic..[50:|
+00000010 74 65 78 74 2f 78 2d 64 69 66 66 5d 0a 3e 30 3d |text/x-diff].>0=|
+00000020 00 05 64 69 66 66 09 0a 3e 30 3d 00 04 2a 2a 2a |..diff..>0=..***|
+00000030 09 0a 3e 30 3d 00 17 43 6f 6d 6d 6f 6e 20 73 75 |..>0=..Common su|
+00000040 62 64 69 72 65 63 74 6f 72 69 65 73 3a 20 0a |bdirectories: .|
+]]></programlisting>
+ </para>
+ <para>
+The <userinput>magic-deleteall</userinput> attribute, which means that implementations
+SHOULD discard information from previous directories, is written out into the magic file using
+__NOMAGIC__ as the value:
+<screen>>0=__NOMAGIC__\n</screen>
+This can be followed by other magic rules for the mimetype.
+ </para>
+ </sect2>
+ <sect2>
+ <title>The XMLnamespaces files</title>
+ <para>
+Each <filename>XMLnamespaces</filename> file is a list of lines in the form:
+<screen>namespaceURI " " localName " " MIME-Type "\n"</screen>
+For example:
+<screen>
+http://www.w3.org/1999/xhtml html application/xhtml+xml
+</screen>
+The lines are sorted (using strcmp in the C locale) and there are no lines with the same namespaceURI and
+localName in one file. If the localName was empty then there will be two spaces following
+the namespaceURI.
+ </para>
+ </sect2>
+ <sect2>
+ <title>The icon files</title>
+ <para>
+The <filename>icons</filename> and <filename>generic-icons</filename> files are list of lines in the form:
+<screen>MIME-Type ":" icon-name "\n"</screen>
+For example:
+<screen>
+application/msword:x-office-document
+</screen>
+ </para>
+ </sect2>
+ <sect2>
+ <title>The treemagic files</title>
+ <para>
+The tree magic data is stored in a file with a format that is very similar to the magic file format.
+ </para>
+ <para>
+The file starts with the magic string "MIME-TreeMagic\0\n". There is no version number in the file.
+Incompatible changes will be handled by creating both the current `treemagic' and a newer `treemagic2'
+in the new format. Where possible, changes will be made in a compatible fashion.
+ </para>
+ <para>
+The rest of the file is made up of a sequence of small sections. Each section is introduced by giving
+the priority and type in brackeds, followed by a newline character. Higher priority entries come
+first. Example:
+<screen>[50:x-content/image-dcf]\n</screen>
+Each line in the section takes the form:
+<screen>[ indent ] ">" "\"" path "\"" "=" type [ "," option ]* "\n"</screen>
+<informaltable>
+ <tgroup cols="2">
+ <thead><row><entry>Part</entry><entry>Meaning</entry></row></thead>
+ <tbody>
+
+ <row><entry>indent</entry><entry>The nesting depth of the rule.</entry></row>
+ <row><entry>path</entry><entry>The path to match.</entry></row>
+ <row><entry>type</entry><entry>The required file type, one of "file", "directory", "link" or "any"</entry></row>
+ <row><entry>option</entry><entry>Optional for the optional attributes of <userinput>treematch</userinput> elements.
+Possible values are "executable", "match-case", "non-empty", or a MIME type</entry></row>
+ </tbody>
+ </tgroup>
+</informaltable>
+ </para><para>
+ </para>
+ </sect2>
+ <sect2>
+ <title>The mime.cache files</title>
+ <para>
+The <filename>mime.cache</filename> files contain the same information as the
+<filename>globs2</filename>, <filename>magic</filename>, <filename>subclasses</filename>,
+<filename>aliases</filename> and <filename>XMLnamespaces</filename> files, in a binary,
+mmappable format:
+</para>
+<programlisting>
+Header:
+2 CARD16 MAJOR_VERSION 1
+2 CARD16 MINOR_VERSION 2
+4 CARD32 ALIAS_LIST_OFFSET
+4 CARD32 PARENT_LIST_OFFSET
+4 CARD32 LITERAL_LIST_OFFSET
+4 CARD32 REVERSE_SUFFIX_TREE_OFFSET
+4 CARD32 GLOB_LIST_OFFSET
+4 CARD32 MAGIC_LIST_OFFSET
+4 CARD32 NAMESPACE_LIST_OFFSET
+4 CARD32 ICONS_LIST_OFFSET
+4 CARD32 GENERIC_ICONS_LIST_OFFSET
+
+AliasList:
+4 CARD32 N_ALIASES
+8*N_ALIASES AliasListEntry
+
+AliasListEntry:
+4 CARD32 ALIAS_OFFSET
+4 CARD32 MIME_TYPE_OFFSET
+
+ParentList:
+4 CARD32 N_ENTRIES
+8*N_ENTRIES ParentListEntry
+
+ParentListEntry:
+4 CARD32 MIME_TYPE_OFFSET
+4 CARD32 PARENTS_OFFSET
+
+Parents:
+4 CARD32 N_PARENTS
+4*N_PARENTS CARD32 MIME_TYPE_OFFSET
+
+LiteralList:
+4 CARD32 N_LITERALS
+12*N_LITERALS LiteralEntry
+
+LiteralEntry:
+4 CARD32 LITERAL_OFFSET
+4 CARD32 MIME_TYPE_OFFSET
+4 CARD32 WEIGHT in lower 8 bits
+ FLAGS in rest:
+ 0x100 = case-sensitive
+
+
+GlobList:
+4 CARD32 N_GLOBS
+12*N_GLOBS GlobEntry
+
+GlobEntry:
+4 CARD32 GLOB_OFFSET
+4 CARD32 MIME_TYPE_OFFSET
+4 CARD32 WEIGHT in lower 8 bits
+ FLAGS in rest:
+ 0x100 = case-sensitive
+
+ReverseSuffixTree:
+4 CARD32 N_ROOTS
+4 CARD32 FIRST_ROOT_OFFSET
+
+ReverseSuffixTreeNode:
+4 CARD32 CHARACTER
+4 CARD32 N_CHILDREN
+4 CARD32 FIRST_CHILD_OFFSET
+
+ReverseSuffixTreeLeafNode:
+4 CARD32 0
+4 CARD32 MIME_TYPE_OFFSET
+4 CARD32 WEIGHT in lower 8 bits
+ FLAGS in rest:
+ 0x100 = case-sensitive
+
+MagicList:
+4 CARD32 N_MATCHES
+4 CARD32 MAX_EXTENT
+4 CARD32 FIRST_MATCH_OFFSET
+
+Match:
+4 CARD32 PRIORITY
+4 CARD32 MIME_TYPE_OFFSET
+4 CARD32 N_MATCHLETS
+4 CARD32 FIRST_MATCHLET_OFFSET
+
+Matchlet:
+4 CARD32 RANGE_START
+4 CARD32 RANGE_LENGTH
+4 CARD32 WORD_SIZE
+4 CARD32 VALUE_LENGTH
+4 CARD32 VALUE_OFFSET
+4 CARD32 MASK_OFFSET (0 if no mask)
+4 CARD32 N_CHILDREN
+4 CARD32 FIRST_CHILD_OFFSET
+
+NamespaceList:
+4 CARD32 N_NAMESPACES
+12*N_NAMESPACES NamespaceEntry
+
+NamespaceEntry:
+4 CARD32 NAMESPACE_URI_OFFSET
+4 CARD32 LOCAL_NAME_OFFSET
+4 CARD32 MIME_TYPE_OFFSET
+
+GenericIconsList:
+IconsList:
+4 CARD32 N_ICONS
+8*N_ICONS IconListEntry
+
+IconListEntry:
+4 CARD32 MIME_TYPE_OFFSET
+4 CARD32 ICON_NAME_OFFSET
+</programlisting>
+<para>
+Lists in the file are sorted, to enable binary searching. The list of
+aliases is sorted by alias, the list of literal globs is sorted by the
+literal. The SuffixTreeNode siblings are sorted by character.
+The list of namespaces is sorted by namespace uri. The list of icons
+is sorted by mimetype.
+</para>
+<para>
+Mimetypes are stored in the suffix tree by appending suffix
+tree leaf nodes with '\0' as character. These nodes appear at the
+beginning of the list of children.
+</para>
+<para>
+All offsets are in bytes from the beginning of the file.
+</para>
+<para>
+Strings are zero-terminated.
+</para>
+<para>
+All numbers are in network (big-endian) order. This is necessary because the data will be stored in
+arch-independent directories like <filename>/usr/share/mime</filename> or even in user's home directories.
+</para>
+<para>
+Cache files have to be written atomically - write to a temporary name, then move over the old file - so
+that clients that have the old cache file open and mmap'ed won't get corrupt data.
+</para>
+ </sect2>
+ <sect2>
+ <title>Storing the MIME type using Extended Attributes</title>
+ <para>
+An implementation MAY also get a file's MIME type from the
+<userinput>user.mime_type</userinput> extended attribute. <!-- The attr(5) man
+page documents this name --> The type given here should normally be used in
+preference to any guessed type, since the user is able to set it explicitly.
+Applications MAY choose to set the type when saving files. Since many
+applications and filesystems do not support extended attributes,
+implementations MUST NOT rely on this method being available.
+ </para>
+ </sect2>
+ <sect2 id="subclassing">
+ <title>Subclassing</title>
+ <para>
+A type is a subclass of another type if any instance of the first type is
+also an instance of the second. For example, all image/svg files are also
+text/xml, text/plain and application/octet-stream files. Subclassing is about
+the format, rather than the category of the data (for example, there is no
+'generic spreadsheet' class that all spreadsheets inherit from).
+ </para>
+ <para>
+Some subclass rules are implicit:
+ <itemizedlist>
+<listitem><para>All text/* types are subclasses of text/plain.</para></listitem>
+<listitem><para>All streamable types (ie, everything except the inode/* types)
+are subclasses of application/octet-stream.</para></listitem>
+ </itemizedlist>
+In addition to these rules, explicit subclass information may be given using
+the <userinput>sub-class-of</userinput> element.
+ </para>
+ <para>
+Note that some file formats are also compressed files (application/x-jar files
+are also application/zip files). However, this is different to a case such as a
+compressed postscript file, which is not a valid postscript file itself (so
+application/x-gzpostscript does not inherit from application/postscript,
+because an application that can handle the latter may not cope with the
+former).
+ </para>
+ <para>
+Some types may or may not be instances of other types. For example, a
+spreadsheet file may be compressed or not. It is a valid spreadsheet file
+either way, but only inherits from application/x-gzip in one case. This
+information cannot be represented statically; instead an application
+interested in this information should run all of the magic rules, and
+use the list of types returned as the subclasses.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Recommended checking order</title>
+ <para>
+Because different applications have different requirements, they may choose to
+use the various methods provided by this specification in any order. However, the
+RECOMMENDED order to perform the checks is:
+ <itemizedlist>
+ <listitem><para>
+If a MIME type is provided explicitly (eg, by a ContentType HTTP header, a MIME
+email attachment, an extended attribute or some other means) then that should
+be used instead of guessing.
+ </para></listitem>
+
+ <listitem><para>
+Otherwise, start by doing a glob match of the filename. Keep only globs with the biggest weight.
+If the patterns are different, keep only globs with the longest pattern, as previously discussed.
+If after this, there is one or more matching glob, and all the matching globs
+result in the same mimetype, use that mimetype as the result.
+ </para></listitem>
+
+ <listitem><para>
+If the glob matching fails or results in multiple conflicting mimetypes, read the
+contents of the file and do magic sniffing on it. If no magic rule matches the data (or if
+the content is not available), use the default type of application/octet-stream for
+binary data, or text/plain for textual data. If there was no glob match, use the magic match
+as the result.
+ </para><para>
+Note: Checking the first 32 bytes of the file for ASCII control characters is
+a good way to guess whether a file is binary or text, but note that files with high-bit-set
+characters should still be treated as text since these can appear in UTF-8 text,
+unlike control characters.
+ </para></listitem>
+
+ <listitem><para>
+If any of the mimetypes resulting from a glob match is equal to or a subclass of
+the result from the magic sniffing, use this as the result. This allows us for example to
+distinguish text files called "foo.doc" from MS-Word files with the same name, as the
+magic match for the MS-Word file would be application/x-ole-storage which the MS-Word type
+inherits.
+ </para></listitem>
+
+ <listitem><para>
+Otherwise use the result of the glob match that has the highest weight.
+ </para></listitem>
+ </itemizedlist>
+ </para>
+ <para>
+There are several reasons for checking the glob patterns before the magic.
+First of all doing magic sniffing is very expensive as reading the contents of the files
+causes a lot of seeks, which is very expensive. Secondly, some applications don't check
+the magic at all (sometimes the content is not available or too slow to read), and this
+makes it more likely that both will get the same type.
+ </para>
+ <para>
+Also, users can easily understand why calling their
+text file <filename>README.mp3</filename> makes the system think it's an MP3,
+whereas they have trouble understanding why their computer thinks
+<filename>README.txt</filename> is a PostScript file. If the system guesses wrongly,
+the user can often rename the file to fix the problem.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Non-regular files</title>
+ <para>
+Sometimes it is useful to assign MIME types to other objects in the filesystem,
+such as directories, sockets and device files. This could be useful when looking up
+an icon for a type, or for providing a textual description of one of these objects.
+The media type 'inode' is provided for this purpose, with the following types corresponding
+to the standard types of object found in a Unix filesystem:
+ </para>
+ <simplelist>
+ <member>inode/blockdevice</member>
+ <member>inode/chardevice</member>
+ <member>inode/directory</member>
+ <member>inode/fifo</member>
+ <member>inode/mount-point</member>
+ <member>inode/socket</member>
+ <member>inode/symlink</member>
+ </simplelist>
+ <para>
+An inode/mount-point is a subclass of inode/directory. It can be useful when adding extra
+actions for these directories, such as 'mount' or 'eject'. Mounted directories can be
+detected by comparing the 'st_dev' of a directory with that of its parent. If
+they differ, they are from different devices and the directory is a mount
+point.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Content types for volumes</title>
+ <para>
+Traditional MIME types apply to individual files or bytestreams. It is often useful
+to apply the same methodologies when classifying the content of mountable volumes or
+filesystems. The x-content type has been introduced for this purpose. Typical examples
+are x-content/audio-dvd, x-content/blank-cd or x-content/image-dcf.
+ </para>
+ <para>
+Matching of content types works with <userinput>treemagic</userinput> elements, which
+are analogous to the <userinput>magic</userinput> elements used for MIME type matching.
+Instead of looking for byte sequences in files, <userinput>treemagic</userinput> element
+allow to look for files with certain names, permissions or mime types in a directory
+hierarchy.
+ </para>
+ </sect2>
+ <sect2>
+ <title>URI scheme handlers</title>
+ <para>
+URI scheme handling (such as a movie player handling mms:// URIs, or a Podcast program
+handling feed:// URIs) are handled through applications handling the x-scheme-handler/foo
+mime-type, where foo is the URI scheme in question.
+ </para>
+ <para>
+This scheme allows URI scheme handling to enjoy the same benefits as mime-type handlers,
+such as the ability to change the default handler, the cross-desktop support, and
+easier application launching.
+ </para>
+ <para>
+Note that this virtual mime-type is not for listing URI schemes that an application
+can load files from. For example, a movie player would not list x-scheme-handler/http
+in its supported mime-types, but it would list x-scheme-handler/rtsp if it supported
+playing back from RTSP locations.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Security implications</title>
+ <para>
+The system described in this document is intended to allow different programs
+to see the same file as having the same type. This is to help interoperability.
+The type determined in this way is only a guess, and an application MUST NOT
+trust a file based simply on its MIME type. For example, a downloader should
+not pass a file directly to a launcher application without confirmation simply
+because the type looks `harmless' (eg, text/plain).
+ </para>
+ <para>
+Do not rely on two applications getting the same type for the same file, even
+if they both use this system. The spec allows some leeway in implementation,
+and in any case the programs may be following different versions of the spec.
+ </para>
+ </sect2>
+ <sect2>
+ <title>User modification</title>
+ <para>
+The MIME database is NOT intended to store user preferences. Users should never
+edit the database. If they wish to make corrections or provide MIME entries for
+software that doesn't provide these itself, they should do so by means of the
+Override.xml mentioned in <xref linkend="s2_layout"/>. Information such as
+"text/html files need to be opened with Mozilla" should NOT go in the database.
+ </para>
+ </sect2>
+</sect1>
+
+<sect1>
+ <title>Contributors</title>
+ <simplelist>
+ <member>
+ Thomas Leonard <email>tal197 at users.sf.net</email>
+ </member>
+ <member>
+ David Faure <email>faure at kde.org</email>
+ </member>
+ <member>
+ Alex Larsson <email>alexl at redhat.com</email>
+ </member>
+ <member>
+ Seth Nickell <email>snickell at stanford.edu</email>
+ </member>
+ <member>
+ Keith Packard <email>keithp at keithp.com</email>
+ </member>
+ <member>
+ Filip Van Raemdonck <email>mechanix at debian.org</email>
+ </member>
+ <member>
+ Christos Zoulas <email>christos at zoulas.com</email>
+ </member>
+ <member>
+ Matthias Clasen <email>mclasen at redhat.com</email>
+ </member>
+ <member>
+ Bastien Nocera <email>hadess at hadess.net</email>
+ </member>
+ </simplelist>
+</sect1>
+
+<bibliography>
+ <title>References</title>
+
+ <bibliomixed>
+ <abbrev>GNOME</abbrev><citetitle>The GNOME desktop,
+ <ulink url="http://www.gnome.org"/></citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>KDE</abbrev><citetitle>The KDE desktop,
+ <ulink url="http://www.kde.org"/></citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>ROX</abbrev><citetitle>The ROX desktop,
+ <ulink url="http://rox.sourceforge.net"/></citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>DesktopEntries</abbrev><citetitle>Desktop Entry Specification,
+ <ulink url="http://www.freedesktop.org/standards/desktop-entry-spec.html"/>
+ </citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>SharedMIME</abbrev><citetitle>Shared MIME-info Database,
+ <ulink url="http://www.freedesktop.org/standards/shared-mime-info.html"/>
+ </citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>RFC-2119</abbrev>
+ <citetitle>Key words for use in RFCs to Indicate Requirement Levels,
+ <ulink url="http://www.ietf.org/rfc/rfc2119.txt?number=2119"/>
+ </citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>BaseDir</abbrev>
+ <citetitle>XDG Base Directory Specification
+ <ulink url="http://www.freedesktop.org/standards/basedir/draft/basedir-spec/basedir-spec.html"/>
+ </citetitle>
+ </bibliomixed>
+ <bibliomixed>
+ <abbrev>ACAP</abbrev>
+ <citetitle>ACAP Media Type Dataset Class
+ <ulink url="ftp://ftp.ietf.org/internet-drafts/draft-ietf-acap-mediatype-01.txt"/>
+ </citetitle>
+ </bibliomixed>
+</bibliography>
+
+</article>