Learnitweb

Field Numbers (Tags) in Protocol Buffers

When learning Protocol Buffers (Protobuf), many developers focus on message structure and data types but do not initially pay much attention to the numbers assigned to fields. These numbers, called field numbers or tags, are actually one of the most important parts of a Protobuf schema because they directly influence serialization, compatibility, and message size.

This tutorial explains what field numbers are, why they matter, how Protobuf uses them internally, and what best practices you should follow when designing real-world schemas.

1. What Are Field Numbers in Protobuf?

When you define a Protobuf message, each field has three parts:

  • Field name
  • Field type
  • Field number (tag)

Example:

message Person {
  string last_name = 1;
  int32 age = 2;
  bool employed = 3;
}

Each field number is a unique numeric identifier used by Protobuf during serialization and deserialization.

The field number is not optional metadata. It is the actual identity of the field in the binary format.

2. How Protobuf Messages Are Modeled in Real Systems

In real applications, when two services communicate:

  1. We model the message structure.
  2. We identify properties (fields).
  3. We assign types.
  4. We assign field numbers.

Example:

message Person {
  string first_name = 1;
  string last_name = 2;
  int32 age = 3;
  bool employed = 4;
}

After defining this schema, Protobuf generates code in Java, Python, C#, and many other languages. Applications use this generated code to build and send messages.

3. Field Names Are for Humans, Tags Are for Protobuf

Field names exist for developers, but field numbers are what Protobuf actually uses on the wire.

Unlike JSON, Protobuf does not send labels like:

{
  "last_name": "Sam"
}

Instead, it sends encoded pairs like:

(tag=1, value="Sam")

This is why Protobuf messages are much smaller than JSON messages.

Field names never travel over the network.

4. Do Field Numbers Need to Be Sequential?

Field numbers do not need to be in ascending order or consecutive.

This is valid:

message Example {
  string name = 108;
  int32 age = 32;
  bool active = 13;
}

But while this is valid, it is not always optimal.

5. Valid and Reserved Ranges

Protobuf allows a very large range of field numbers.

Valid range:

1 to 536,870,911

Reserved range:

19,000 to 19,999

The reserved range is for Protobuf’s internal use and must never be used in your schemas.

In practice, most messages use fewer than 50 fields, so the upper limit is rarely a concern.

6. Field Numbers Must Be Unique

Within a single message, every field number must be unique.

Invalid example:

message BadExample {
  string first_name = 1;
  string last_name = 1; // ❌ Not allowed
}

Protobuf will reject this because decoding would become ambiguous.

7. Why Field Numbers Are So Important

Field numbers directly affect serialization size because Protobuf uses variable-length encoding for tags.

Encoding cost:

  • Tags 1–15 → 1 byte
  • Tags 16–2047 → 2 bytes
  • Larger tags → more bytes

Smaller tag numbers produce smaller messages.

8. Optimization Strategy for Tag Assignment

Frequently used fields should receive smaller tag numbers.

Example:

message Person {
  string first_name = 1;
  string last_name = 2;
  int32 age = 3;

  string middle_name = 25;
}

First name, last name, and age are commonly present, so they get small tags, while middle name is less frequent and can use a larger number.

At scale, this optimization saves bandwidth and storage.

9. Avoid One Giant Message

Trying to fit everything into one massive message leads to confusion and poor tag planning.

A better approach is:

  • Split data into multiple messages
  • Keep messages focused
  • Reuse small numbers per message

Example:

message Name {
  string first = 1;
  string middle = 2;
  string last = 3;
}

message Employment {
  bool employed = 1;
  string company = 2;
}

Each message gets its own numbering space.

10. How Serialization Works Conceptually

Suppose a client builds:

  • last_name = “Sam”
  • age = 12
  • employed = false

During serialization:

  • Protobuf encodes tags and values
  • Default values are omitted

Since false is the default for bool, it is not serialized.

So the message might only contain:

  • tag 1 → “Sam”
  • tag 2 → 12

This keeps messages compact.

11. How Deserialization Works

The receiver must use the same .proto file to decode the message.

If it receives:

(tag=1, "Sam"), (tag=2, 12)

The schema tells it:

  • Tag 1 → last_name
  • Tag 2 → age

Since employed is missing, it automatically becomes false because that is the default.

13. Best Practices Summary

Always keep these rules in mind:

  • Field numbers are the real identifiers in Protobuf.
  • Field names are only for developers.
  • Use tags 1–15 for frequently used fields.
  • Never reuse a tag within the same message.
  • Avoid the reserved range 19,000–19,999.
  • Split large schemas into smaller messages.
  • Remember that default values are not serialized.