This document describes the binary encoding for Fudge-encoded data.
While the term "Fudge" can be used to mean multiple things, fundamentally it is a set of tools and code which can be used to generate and consume data which adheres to the Fudge Encoding Specification.
This specification is designed with the following key characteristics:
- Compactness. Fudge-encoded data should be as small as reasonable.
- Hardware Efficiency. Key data structures should be easy to process in hardware or in embedded systems.
- Software Efficiency. Key data structures should be simple to encode and decode in software, and should tax the CPU as little as possible while doing it.
- Flexibility. The same encoding specification should be capable of being used in extremely verbose environments where self description is more important than size, or in extremely performance critical environments where every byte matters.
The classic example of how this works together (and contrasts with a system like Google Protocol Buffers) is in the case of integral values:
- Fudge stores them in native network byte order integral format, Google Protocol Buffers stores them in zigzag var-128 encoding.
- Fudge shrinks size by storing values smaller than the specified type in a smaller type (for example, if a value of 4 is provided for an int32 type, an int8 will actually be encoded).
- This has the desired effect of storing fewer bytes for small integral values, without the CPU overhead of var-128 encoding
- Fudge allows for field names to be included in the data stream, linked in via a taxonomy, or excluded altogether
- Google Protocol Buffers only allows them to be excluded entirely
- Key message and field boundaries are aligned on 32 and 16 bytes accordingly
For messages containing fields of a fixed width type, Fudge encoding adds the following overhead:
- Per-message overhead - 8 bytes
- Only Sequence-based Field Identification - 2 bytes per field
- Only Ordinal Field Identification - 4 bytes per field
- Ordinal + 10-char Name Identification - 15 bytes per field
So for example a single message with a single field using only ordinal field identification adds 12 bytes total overhead.
For situations where messages should be self-descriptive, but also support a rich taxonomy at runtime with full names for display and search, a message may specify a double byte taxonomy.
At runtime, then, the Fudge system can locate a description of field ordinal (or sequential number) to name mappings, and provide these automatically to end-user applications without the overhead of encoding the names of all fields in all messages.
Taxonomies are not intended to be internet scale, which is why only two bytes are provided. Applications will compose the taxonomies they are using from any publicly specified. Furthermore, a taxonomy doesn't have to be unique to a particular application or message format: as long as the ordinals are all unique, a number of different message formats can share the same taxonomy definition.
For more information, see the Taxonomy page.
Encoding a Fudge message involves writing two distinct elements:
- A fixed-size Message Header, which contains processing directives for the message
- A series of fields, terminated by a special field type
The processing directives in the Message Header apply to all fields contained in the message, whether top-level fields or embedded (sub-message) fields. Therefore, sub-messages are simply encoded as a series of fields.
Each message envelope begins with the following sequence:
|Processing Directives||Schema Version||Taxonomy||Message Size|
|1 byte||1 byte||2 bytes||4 bytes|
Schema Version - One byte indicating the version of the logical schema that was used to generate the message. This is a field specifically for the use of the encoding application, and is used to signal to receiving applications which application-specific decoding logic should be used.
Processing Directive - One byte containing a bit field specifying options for the processing of the message. This byte is currently entirely for future expansion and to improve the alignment of the fields in the message header, and is not currently used.
Taxonomy - A 2-byte identifier for the taxonomy to be looked up in the taxonomy reference table specific to the message sender. This is not a globally unique reference, and is specific to the taxonomy of the message encoder.
Message Size - The size, in bytes, of the message. The message size is the total size of all data in the message, and includes the message envelope itself.
A particular field is encoded in the following manner:
|1 byte||1 byte||[2 bytes]||[var, up to 256 bytes]||[var]|
Field Prefix - See below.
Type - 1 byte indicating the type of data included
Ordinal - If provided, a 2-byte signed integral ordinal for the specification of the field
Name - If provided, a variable encoded (exactly 1 width byte) textual description for the field.
Data - Variable or fixed encoded data (determined by the type). Number of bytes of size prefix is given in the header.
The field prefix is a 1-byte header which contains the following bit fields:
|Bit 7||Bits 6-5||Bit 4||Bit 3||Bits 2-0|
|1 if fixed width, 0 if variable||Variable Width Size Indicator||1 if ordinal provided||1 if name provided||For future expansion|
Variable Width Size Indicator - These two bytes will have one of the following values:
|00||Either a fixed width type, or an empty variable width field|
|01||1 byte is used to encode the size of the variable width payload|
|10||2 bytes is used to encode the size of the variable width payload|
|11||4 bytes is used to encode the size of the variable width payload|
The Fudge specification includes a set of Standard Types that any compliant system must support, with defined reduction rules so that the smallest possible encoding for the data is used.
Sub-messages are encoded using the following system:
- In the containing message, a field entry is provided with the StartFudgeMsg type. A name, ordinal, both or neither may be provided as with any other field.
- The StartFudgeMsg field is variable-width. The 2-bit variable-width field size indicator will be provided as usual.
- The size of the sub-message is specified as defined by the 2-bit variable-width field size indicator.
- The end of the sub-message is implied by the size sent with the StartFudgeMsg field.
- The fields of the sub-message will immediately follow in the data stream
Note that aside from the StartFudgeMsg field type header, this is the exact same processing as for the fields in a Fudge Message Envelope.
All variable-width data is composed of an N-byte prefix (the number of bytes is either fixed or specified in the field encoding prefix) coupled with the actual variable data.
All text MUST be encoded in UTF-8.
All data MUST be encoded in Network Byte Order, notably integers and IEEE-751 floating point numbers.
The ordering of repeated fields within a message, or sub-message, MUST be preserved by any compliant system as the ordering can be used to represent sequence within a list or other higher level programming construct. Fields are considered to be repeated if:
- both do not specify a Name or Ordinal (anonymous fields); or
- both specify a Name and the Names match; or
- both specify an Ordinal and the Ordinals match.
For any other fields, the ordering is not guaranteed to be preserved.