When learning Protocol Buffers (Protobuf), many developers focus on message structure and data types but do not initially pay much attention to the numbers assigned to fields. These numbers, called field numbers or tags, are actually one of the most important parts of a Protobuf schema because they directly influence serialization, compatibility, and message size.
This tutorial explains what field numbers are, why they matter, how Protobuf uses them internally, and what best practices you should follow when designing real-world schemas.
1. What Are Field Numbers in Protobuf?
When you define a Protobuf message, each field has three parts:
- Field name
- Field type
- Field number (tag)
Example:
message Person {
string last_name = 1;
int32 age = 2;
bool employed = 3;
}
Each field number is a unique numeric identifier used by Protobuf during serialization and deserialization.
The field number is not optional metadata. It is the actual identity of the field in the binary format.
2. How Protobuf Messages Are Modeled in Real Systems
In real applications, when two services communicate:
- We model the message structure.
- We identify properties (fields).
- We assign types.
- We assign field numbers.
Example:
message Person {
string first_name = 1;
string last_name = 2;
int32 age = 3;
bool employed = 4;
}
After defining this schema, Protobuf generates code in Java, Python, C#, and many other languages. Applications use this generated code to build and send messages.
3. Field Names Are for Humans, Tags Are for Protobuf
Field names exist for developers, but field numbers are what Protobuf actually uses on the wire.
Unlike JSON, Protobuf does not send labels like:
{
"last_name": "Sam"
}
Instead, it sends encoded pairs like:
(tag=1, value="Sam")
This is why Protobuf messages are much smaller than JSON messages.
Field names never travel over the network.
4. Do Field Numbers Need to Be Sequential?
Field numbers do not need to be in ascending order or consecutive.
This is valid:
message Example {
string name = 108;
int32 age = 32;
bool active = 13;
}
But while this is valid, it is not always optimal.
5. Valid and Reserved Ranges
Protobuf allows a very large range of field numbers.
Valid range:
1 to 536,870,911
Reserved range:
19,000 to 19,999
The reserved range is for Protobuf’s internal use and must never be used in your schemas.
In practice, most messages use fewer than 50 fields, so the upper limit is rarely a concern.
6. Field Numbers Must Be Unique
Within a single message, every field number must be unique.
Invalid example:
message BadExample {
string first_name = 1;
string last_name = 1; // ❌ Not allowed
}
Protobuf will reject this because decoding would become ambiguous.
7. Why Field Numbers Are So Important
Field numbers directly affect serialization size because Protobuf uses variable-length encoding for tags.
Encoding cost:
- Tags 1–15 → 1 byte
- Tags 16–2047 → 2 bytes
- Larger tags → more bytes
Smaller tag numbers produce smaller messages.
8. Optimization Strategy for Tag Assignment
Frequently used fields should receive smaller tag numbers.
Example:
message Person {
string first_name = 1;
string last_name = 2;
int32 age = 3;
string middle_name = 25;
}
First name, last name, and age are commonly present, so they get small tags, while middle name is less frequent and can use a larger number.
At scale, this optimization saves bandwidth and storage.
9. Avoid One Giant Message
Trying to fit everything into one massive message leads to confusion and poor tag planning.
A better approach is:
- Split data into multiple messages
- Keep messages focused
- Reuse small numbers per message
Example:
message Name {
string first = 1;
string middle = 2;
string last = 3;
}
message Employment {
bool employed = 1;
string company = 2;
}
Each message gets its own numbering space.
10. How Serialization Works Conceptually
Suppose a client builds:
- last_name = “Sam”
- age = 12
- employed = false
During serialization:
- Protobuf encodes tags and values
- Default values are omitted
Since false is the default for bool, it is not serialized.
So the message might only contain:
- tag 1 → “Sam”
- tag 2 → 12
This keeps messages compact.
11. How Deserialization Works
The receiver must use the same .proto file to decode the message.
If it receives:
(tag=1, "Sam"), (tag=2, 12)
The schema tells it:
- Tag 1 → last_name
- Tag 2 → age
Since employed is missing, it automatically becomes false because that is the default.
13. Best Practices Summary
Always keep these rules in mind:
- Field numbers are the real identifiers in Protobuf.
- Field names are only for developers.
- Use tags 1–15 for frequently used fields.
- Never reuse a tag within the same message.
- Avoid the reserved range 19,000–19,999.
- Split large schemas into smaller messages.
- Remember that default values are not serialized.
