diff options
Diffstat (limited to 'docs/source/Internals.md')
-rwxr-xr-x | docs/source/Internals.md | 244 |
1 files changed, 244 insertions, 0 deletions
diff --git a/docs/source/Internals.md b/docs/source/Internals.md new file mode 100755 index 00000000..b744c784 --- /dev/null +++ b/docs/source/Internals.md @@ -0,0 +1,244 @@ +# FlatBuffer Internals + +This section is entirely optional for the use of FlatBuffers. In normal +usage, you should never need the information contained herein. If you're +interested however, it should give you more of an appreciation of why +FlatBuffers is both efficient and convenient. + +### Format components + +A FlatBuffer is a binary file and in-memory format consisting mostly of +scalars of various sizes, all aligned to their own size. Each scalar is +also always represented in little-endian format, as this corresponds to +all commonly used CPUs today. FlatBuffers will also work on big-endian +machines, but will be slightly slower because of additional +byte-swap intrinsics. + +On purpose, the format leaves a lot of details about where exactly +things live in memory undefined, e.g. fields in a table can have any +order, and objects to some extend can be stored in many orders. This is +because the format doesn't need this information to be efficient, and it +leaves room for optimization and extension (for example, fields can be +packed in a way that is most compact). Instead, the format is defined in +terms of offsets and adjacency only. + +### Format identification + +The format also doesn't contain information for format identification +and versioning, which is also by design. FlatBuffers is a statically typed +system, meaning the user of a buffer needs to know what kind of buffer +it is. FlatBuffers can of course be wrapped inside other containers +where needed, or you can use its union feature to dynamically identify +multiple possible sub-objects stored. Additionally, it can be used +together with the schema parser if full reflective capabilities are +desired. + +Versioning is something that is intrinsically part of the format (the +optionality / extensibility of fields), so the format itself does not +need a version number (it's a meta-format, in a sense). We're hoping +that this format can accommodate all data needed. If format breaking +changes are ever necessary, it would become a new kind of format rather +than just a variation. + +### Offsets + +The most important and generic offset type (see `flatbuffers.h`) is +`offset_t`, which is currently always a `uint32_t`, and is used to +refer to all tables/unions/strings/vectors. 32bit is +intentional, since we want to keep the format binary compatible between +32 and 64bit systems, and a 64bit offset would bloat the size for almost +all uses. A version of this format with 64bit (or 16bit) offsets is easy to set +when needed. Unsigned means they can only point in one direction, which +typically is forward (towards a higher memory location). Any backwards +offsets will be explicitly marked as such. + +The format starts with an `offset_t` to the root object in the buffer. + +We have two kinds of objects, structs and tables. + +### Structs + +These are the simplest, and as mentioned, intended for simple data that +benefits from being extra efficient and doesn't need versioning / +extensibility. They are always stored inline in their parent (a struct, +table, or vector) for maximum compactness. Structs define a consistent +memory layout where all components are aligned to their size, and +structs aligned to their largest scalar member. This is done independent +of the alignment rules of the underlying compiler to guarantee a cross +platform compatible layout. This layout is then enforced in the generated +code. + +### Tables + +These start with an `soffset_t` to a vtable (signed version of +`offset_t`, since vtables may be stored anywhere), followed by all the +fields as aligned scalars. Unlike structs, not all fields need to be +present. There is no set order and layout. + +To be able to access fields regardless of these uncertainties, we go +through a vtable of offsets. Vtables are shared between any objects that +happen to have the same vtable values. + +The elements of a vtable are all of type `voffset_t`, which is currently +a `uint16_t`. The first element is the number of elements of the vtable, +including this one. The second one is the size of the object, in bytes +(including the vtable offset). This size is used for streaming, to know +how many bytes to read to be able to access all fields of the object. +The remaining elements are N the offsets, where N is the amount of field +declared in the schema when the code that constructed this buffer was +compiled (thus, the size of the table is N + 2). + +All accessor functions in the generated code for tables contain the +offset into this table as a constant. This offset is checked against the +first field (the number of elements), to protect against newer code +reading older data. If this offset is out of range, or the vtable entry +is 0, that means the field is not present in this object, and the +default value is return. Otherwise, the entry is used as offset to the +field to be read. + +### Strings and Vectors + +Strings are simply a vector of bytes, and are always +null-terminated. Vectors are stored as contiguous aligned scalar +elements prefixed by a count. + +### Construction + +The current implementation constructs these buffers backwards, since +that significantly reduces the amount of bookkeeping and simplifies the +construction API. + +### Code example + +Here's an example of the code that gets generated for the `samples/monster.fbs`. +What follows is the entire file, broken up by comments: + + // automatically generated, do not modify + + #include "flatbuffers/flatbuffers.h" + + namespace MyGame { + namespace Sample { + +Nested namespace support. + + enum { + Color_Red = 0, + Color_Green = 1, + Color_Blue = 2, + }; + + inline const char **EnumNamesColor() { + static const char *names[] = { "Red", "Green", "Blue", nullptr }; + return names; + } + + inline const char *EnumNameColor(int e) { return EnumNamesColor()[e]; } + +Enums and convenient reverse lookup. + + enum { + Any_NONE = 0, + Any_Monster = 1, + }; + + inline const char **EnumNamesAny() { + static const char *names[] = { "NONE", "Monster", nullptr }; + return names; + } + + inline const char *EnumNameAny(int e) { return EnumNamesAny()[e]; } + +Unions share a lot with enums. + + struct Vec3; + struct Monster; + +Predeclare all datatypes since there may be circular references. + + MANUALLY_ALIGNED_STRUCT(4) Vec3 { + private: + float x_; + float y_; + float z_; + + public: + Vec3(float x, float y, float z) + : x_(flatbuffers::EndianScalar(x)), y_(flatbuffers::EndianScalar(y)), z_(flatbuffers::EndianScalar(z)) {} + + float x() const { return flatbuffers::EndianScalar(x_); } + float y() const { return flatbuffers::EndianScalar(y_); } + float z() const { return flatbuffers::EndianScalar(z_); } + }; + STRUCT_END(Vec3, 12); + +These ugly macros do a couple of things: they turn off any padding the compiler +might normally do, since we add padding manually (though none in this example), +and they enforce alignment chosen by FlatBuffers. This ensures the layout of +this struct will look the same regardless of compiler and platform. Note that +the fields are private: this is because these store little endian scalars +regardless of platform (since this is part of the serialized data). +`EndianScalar` then converts back and forth, which is a no-op on all current +mobile and desktop platforms, and a single machine instruction on the few +remaining big endian platforms. + + struct Monster : private flatbuffers::Table { + const Vec3 *pos() const { return GetStruct<const Vec3 *>(4); } + int16_t mana() const { return GetField<int16_t>(6, 150); } + int16_t hp() const { return GetField<int16_t>(8, 100); } + const flatbuffers::String *name() const { return GetPointer<const flatbuffers::String *>(10); } + const flatbuffers::Vector<uint8_t> *inventory() const { return GetPointer<const flatbuffers::Vector<uint8_t> *>(14); } + int8_t color() const { return GetField<int8_t>(16, 2); } + }; + +Tables are a bit more complicated. A table accessor struct is used to point at +the serialized data for a table, which always starts with an offset to its +vtable. It derives from `Table`, which contains the `GetField` helper functions. +GetField takes a vtable offset, and a default value. It will look in the vtable +at that offset. If the offset is out of bounds (data from an older version) or +the vtable entry is 0, the field is not present and the default is returned. +Otherwise, it uses the entry as an offset into the table to locate the field. + + struct MonsterBuilder { + flatbuffers::FlatBufferBuilder &fbb_; + flatbuffers::uoffset_t start_; + void add_pos(const Vec3 *pos) { fbb_.AddStruct(4, pos); } + void add_mana(int16_t mana) { fbb_.AddElement<int16_t>(6, mana, 150); } + void add_hp(int16_t hp) { fbb_.AddElement<int16_t>(8, hp, 100); } + void add_name(flatbuffers::Offset<flatbuffers::String> name) { fbb_.AddOffset(10, name); } + void add_inventory(flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory) { fbb_.AddOffset(14, inventory); } + void add_color(int8_t color) { fbb_.AddElement<int8_t>(16, color, 2); } + MonsterBuilder(flatbuffers::FlatBufferBuilder &_fbb) : fbb_(_fbb) { start_ = fbb_.StartTable(); } + flatbuffers::Offset<Monster> Finish() { return flatbuffers::Offset<Monster>(fbb_.EndTable(start_, 7)); } + }; + +`MonsterBuilder` is the base helper struct to construct a table using a +`FlatBufferBuilder`. You can add the fields in any order, and the `Finish` +call will ensure the correct vtable gets generated. + + inline flatbuffers::Offset<Monster> CreateMonster(flatbuffers::FlatBufferBuilder &_fbb, const Vec3 *pos, int16_t mana, int16_t hp, flatbuffers::Offset<flatbuffers::String> name, flatbuffers::Offset<flatbuffers::Vector<uint8_t>> inventory, int8_t color) { + MonsterBuilder builder_(_fbb); + builder_.add_inventory(inventory); + builder_.add_name(name); + builder_.add_pos(pos); + builder_.add_hp(hp); + builder_.add_mana(mana); + builder_.add_color(color); + return builder_.Finish(); + } + +`CreateMonster` is a convenience function that calls all functions in +`MonsterBuilder` above for you. Note that if you pass values which are +defaults as arguments, it will not actually construct that field, so +you can probably use this function instead of the builder class in +almost all cases. + + inline const Monster *GetMonster(const void *buf) { return flatbuffers::GetRoot<Monster>(buf); } + +This function is only generated for the root table type, to be able to +start traversing a FlatBuffer from a raw buffer pointer. + + }; // namespace MyGame + }; // namespace Sample + + |