summaryrefslogtreecommitdiff
path: root/doc/manual/format
blob: 5eb494bce70e05d4041c83b46b708fa07affe92e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
/*! \page pkgformat Package format

This document describes the RPM file format version 3.0, which is used
by RPM versions 2.1 and greater.  The format is subject to change, and
you should not assume that this document is kept up to date with the
latest RPM code.  That said, the 3.0 format should not change for
quite a while, and when it does, it will not be 3.0 anymore :-).

\warning In any case, THE PROPER WAY TO ACCESS THESE STRUCTURES IS THROUGH
THE RPM LIBRARY!!

The RPM file format covers both source and binary packages.  An RPM
package file is divided in 4 logical sections:

\verbatim
. Lead      -- 96 bytes of "magic" and other info
. Signature -- collection of "digital signatures"
. Header    -- holding area for all the package information (aka "metadata")
. Payload   -- compressed archive of the file(s) in the package (aka "payload")
\endverbatim

All 2 and 4 byte "integer" quantities (int16 and int32) are stored in
network byte order.  When data is presented, the first number is the
byte number, or address, in hex, followed by the byte values in hex,
followed by character "translations" (where appropriate).

\subsection pkgformat_lead Lead

The Lead is basically for file(1).  All the information contained in
the Lead is duplicated or superceded by information in the Header.
Much of the info in the Lead was used in old versions of RPM but is
now ignored.  The Lead is stored as a C structure:

\code
struct rpmlead {
    unsigned char magic[4];
    unsigned char major, minor;
    short type;
    short archnum;
    char name[66];
    short osnum;
    short signature_type;
    char reserved[16];
};
\endcode

and is illustrated with one pulled from the rpm-2.1.2-1.i386.rpm
package:

\verbatim
00000000: ed ab ee db 03 00 00 00
\endverbatim

The first 4 bytes (0-3) are "magic" used to uniquely identify an RPM
package.  It is used by RPM and file(1).  The next two bytes (4, 5)
are int8 quantities denoting the "major" and "minor" RPM file format
version.  This package is in 3.0 format.  The following 2 bytes (6-7)
form an int16 which indicates the package type.  As of this writing
there are only two types: 0 == binary, 1 == source.

\verbatim
00000008: 00 01 72 70 6d 2d 32 2e    ..rpm-2.
\endverbatim

The next two bytes (8-9) form an int16 that indicates the architecture
the package was built for.  While this is used by file(1), the true
architecture is stored as a string in the Header.  See, lib/misc.c for
a list of architecture->int16 translations.  In this case, 1 == i386.
Starting with byte 10 and extending to byte 75, are 65 characters and
a null byte which contain the familiar "name-version-release" of the
package, padded with null (0) bytes.

\verbatim
00000010: 31 2e 32 2d 31 00 00 00    1.2-1...
00000018: 00 00 00 00 00 00 00 00    ........
00000020: 00 00 00 00 00 00 00 00    ........
00000028: 00 00 00 00 00 00 00 00    ........
00000030: 00 00 00 00 00 00 00 00    ........
00000038: 00 00 00 00 00 00 00 00    ........
00000040: 00 00 00 00 00 00 00 00    ........
00000048: 00 00 00 00 00 01 00 05    ........
\endverbatim

Bytes 76-77 ("00 01" above) form an int16 that indicates the OS the
package was built for.  In this case, 1 == Linux.  The next 2 bytes
(78-79) form an int16 that indicates the signature type.  This tells
RPM what to expect in the Signature.  For version 3.0 packages, this
is 5, which indicates the new "Header-style" signatures.

\verbatim
00000050: 04 00 00 00 68 e6 ff bf    ........
00000058: ab ad 00 08 3c eb ff bf    ........
\endverbatim

The remaining 16 bytes (80-95) are currently unused and are reserved
for future expansion.

\subsection pkgformat_signature Signature

A 3.0 format signature (denoted by signature type 5 in the Lead), uses
the same structure as the Header.  For historical reasons, this
structure is called a "header structure", which can be confusing since
it is used for both the Header and the Signature.  The details of the
header structure are given below, and you'll want to read them so the
rest of this makes sense.  The tags for the Signature are defined in
lib/signature.h.

The Signature can contain multiple signatures, of different types.
There are currently only three types, each with its own tag in the
header structure:

\verbatim
	Name	Tag	Header Type
	----	----	-----------
	SIZE	1000	INT_32
	MD5	1001	BIN
	PGP	1002	BIN
\endverbatim

The MD5 signature is 16 bytes, and the PGP signature varies with
the size of the PGP key used to sign the package.

As of RPM 2.1, all packages carry at least SIZE and MD5 signatures,
and the Signature section is padded to a multiple of 8 bytes.

\subsection pkgformat_header Header

The Header contains all the information about a package: name,
version, file list, etc.  It uses the same "header structure" as the
Signature, which is described in detail below.  A complete list of the
tags for the Header would take too much space to list here, and the
list grows fairly frequently.  For the complete list see lib/rpmlib.h
in the RPM sources.

\subsection pkgformat_payload Payload

The Payload is currently a gzipped cpio archive.  The cpio
archive type used is SVR4 with a CRC checksum.

\subsection pkgformat_header_structure The Header Structure

The header structure is a little complicated, but actually performs a
very simple function.  It acts almost like a small database in that it
allows you to store and retrieve arbitrary data with a key called a
"tag".  When a header structure is written to disk, the data is
written in network byte order, and when it is read from disk, is is
converted to host byte order.

Along with the tag and the data, a data "type" is stored, which indicates,
obviously, the type of the data associated with the tag.  There are
currently 9 types:

\verbatim
	Type		Number
	----		------
	NULL		0
	CHAR		1
	INT8		2
	INT16		3
	INT32		4
	INT64		5
	STRING		6
	BIN		7
	STRING_ARRAY	8
	I18NSTRING_TYPE	9
\endverbatim

One final piece of information is a "count" which is stored with each
tag, and indicates the number of items of the associated type that are
stored.  As a special case, the STRING type is not allowed to have a
count greater than 1.  To store more than one string you must use a
STRING_ARRAY.

Altogether, the tag, type, count, and data are called an "Entry" or
"Header Entry".

\verbatim
00000000: 8e ad e8 01 00 00 00 00    ........
\endverbatim

A header begins with 3 bytes of magic "8e ad e8" and a single byte to
indicate the header version.  The next four bytes (4-7) are reserved.

\verbatim
00000008: 00 00 00 20 00 00 07 77    ........
\endverbatim

The next four bytes (8-11) form an int32 that is a count of the number
of entries stored (in this case, 32).  Bytes 12-15 form an int32 that
is a count of the number of bytes of data stored (that is, the number
of bytes made up by the data portion of each entry).  In this case it
is 1911 bytes.

\verbatim
00000010: 00 00 03 e8 00 00 00 06 00 00 00 00 00 00 00 01    ................
\endverbatim

Following the first 16 bytes is the part of the header called the
"index".  The index is made of up "index entries", one for each entry
in the header.  Each index entry contains four int32 quantities.  In
order, they are: tag, type, offset, count.  In the above example, we
have tag=1000, type=6, offset=0, count=1.  By looking up the the tag
in lib/rpmlib.h we can see that this entry is for the package name.
The type of the entry is a STRING.  The offset is an offset from the
start of the data part of the header to the data associated with this
entry.  The count indicates that there is only one string associated
with the entry (which we really already knew since STRING types are
not allowed to have a count greater than 1).

In our example there would be 32 such 16-byte index entries, followed
by the data section:

\verbatim
00000210: 72 70 6d 00 32 2e 31 2e 32 00 31 00 52 65 64 20    rpm.2.1.2.1.Red 
00000220: 48 61 74 20 50 61 63 6b 61 67 65 20 4d 61 6e 61    Hat Package Mana
00000230: 67 65 72 00 31 e7 cb b4 73 63 68 72 6f 65 64 65    ger.1...schroede
00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00    r.redhat.com....
...
00000970: 6c 69 62 63 2e 73 6f 2e 35 00 6c 69 62 64 62 2e    libc.so.5.libdb.
00000980: 73 6f 2e 32 00 00                                  so.2..
\endverbatim

The data section begins at byte 528 (4 magic, 4 reserved, 4 index
entry count, 4 data byte count, 16 * 32 index entries).  At offset 0,
bytes 528-531 are "rpm" plus a null byte, which is the data for the
first index entry (the package name).  Following is is the data for
each of the other entries.  Each string is null terminated, the strings
in a STRING_ARRAY are also null terminated and are place one after
another.  The integer types are aligned to appropriate byte boundaries,
so that the data of INT64 type starts on an 8 byte boundary, INT32
type starts on a 4 byte boundary, and an INT16 type starts on a 2 byte
boundary.  For example:

\verbatim
00000060: 00 00 03 ef 00 00 00 06 00 00 00 28 00 00 00 01    ................
00000070: 00 00 03 f1 00 00 00 04 00 00 00 40 00 00 00 01    ................
...
00000240: 72 2e 72 65 64 68 61 74 2e 63 6f 6d 00 00 00 00    r.redhat.com....
00000250: 00 09 9b 31 52 65 64 20 48 61 74 20 4c 69 6e 75    ....Red Hat Linu
\endverbatim

Index entry number 6 is the BUILDHOST, of type STRING.  Index entry
number 7 is the SIZE, of type INT32.  The corresponding data for entry
6 end at byte 588 with "....redhat.com\0".  The next piece of data
could start at byte 589, byte that is an improper boundary for an INT32.
As a result, 3 null bytes are inserted and the date for the SIZE actually
starts at byte 592: "00 09 9b 31", which is 629553).

\subsection pkgformat_tools Tools

The tools directory in the RPM sources contains a number of small
programs that use the RPM library to pick apart packages.  These
tools are mostly used for debugging, but can also be used to help
you understand the internals of the RPM package format.

\verbatim
	rpmlead		- extracts the Lead from a package
	rpmsignature	- extracts the Signature from a package
	rpmheader	- extracts the Header from a package
	rpmarchive	- extracts the Archive from a package
	dump		- displays a header structure in readable format
\endverbatim

Given a package foo.rpm you might try:

\verbatim
	rpmlead foo.rpm | od -x
	rpmsignature foo.rpm | dump
	rpmheader foo.rpm | dump
	rpmarchive foo.rpm | zcat | cpio --list
\endverbatim

*/