Learnitweb

Protobuf Schema Evolution and Backward Compatibility

In real-world systems, APIs do not remain static. Requirements change, fields are added, names are improved, and sometimes structures evolve. When using Protocol Buffers, these changes must be handled carefully so that different versions of services can still communicate safely.

This tutorial explains how Protobuf handles API changes, what happens when clients and servers use different schema versions, and what rules you must follow to maintain compatibility.

1. Real-World Scenario: Multiple Teams, One API

Imagine an organization with multiple teams:

  • Team A owns a service
  • Team B and Team C are clients of that service
  • The service communicates using Protobuf messages

Team A publishes a V1 schema. Everyone generates code and starts using it.

Later:

  • Team A upgrades to V2
  • Some clients upgrade, others do not

Later again:

  • Team A upgrades to V3
  • Clients are on mixed versions

The big question is:

Will communication break when versions differ?

To understand this, we simulate version evolution.

2. Version 1 (V1) Schema

television.proto (V1)

syntax = "proto3";

package section05.v1;

option java_package = "com.example.section05.v1";
option java_multiple_files = true;

message Television {
  string brand = 1;
  int32 year = 2;
}

    2.1 V1 Producer Example (Java)

    Television tv = Television.newBuilder()
            .setBrand("Samsung")
            .setYear(2019)
            .build();
    
    byte[] bytes = tv.toByteArray();
    

    2.2 V1 Parser (Client)

    Television tv = Television.parseFrom(bytes);
    
    System.out.println(tv.getBrand());
    System.out.println(tv.getYear());
    

    Everything works as expected.

    3. New Requirements → V2

    New needs arise:

    1. Add a new field: TV type (HD, UHD, OLED)
    2. Rename year to model for clarity

    We create V2.

    4. Version 2 (V2) Schema

    syntax = "proto3";
    
    package section05.v2;
    
    option java_package = "com.example.section05.v2";
    option java_multiple_files = true;
    
    message Television {
    
      enum TvType {
        HD = 0;
        UHD = 1;
        OLED = 2;
      }
    
      string brand = 1;
    
      // Renamed from year → model
      int32 model = 2;
    
      TvType type = 3;
    }
    

    Changes:

    • Field 2 renamed
    • Field 3 added

    5. Scenario: V2 Server → V1 Client

    Server sends V2 message:

    Television tv = Television.newBuilder()
            .setBrand("Samsung")
            .setModel(2019)
            .setType(TvType.UHD)
            .build();
    

    Serialized and sent to a V1 client.

    What Happens?

    V1 client sees:

    • Tag 1 → brand
    • Tag 2 → year

    It does NOT know about tag 3.

    Result:

    • brand = Samsung
    • year = 2019
    • type ignored

    No failure occurs.

    Why?

    Because Protobuf decodes by field number, not field name.

    6. Key Rule: Renaming Fields Is Safe

    Changing:

    int32 year = 2;
    

    to:

    int32 model = 2;
    

    is safe because:

    • Tag number unchanged
    • Type unchanged

    Protobuf still maps tag 2 correctly.

    Field names are only for developers.

    7. Key Rule: Changing Types Is Dangerous

    Changing:

    int32 year = 2;
    

    to:

    string model = 2;
    

    is unsafe. Because encoding depends on type.

    This can cause:

    • Parsing errors
    • Corrupted data
    • Runtime failures

    Never change a field’s type once released.

    8. Unknown Fields Behavior

    When a client receives fields it does not know:

    • They are stored as unknown fields
    • They are ignored by getters
    • They do not break parsing

    V1 client receiving tag 3 (type):

    • Cannot access it
    • But it exists internally as unknown

    This allows forward compatibility.

    9. Scenario: V1 Server → V2 Client

    Server sends V1 message:

    • brand
    • year

    V2 client parses:

    • brand → OK
    • model (tag 2) → OK
    • type (tag 3) → missing

    Since type is missing:

    • Default enum value used
    • Default = 0 → HD

    Again, no failure occurs.

    10. Why This Works

    Protobuf design principles:

    • Tag-based decoding
    • Default values for missing fields
    • Ignoring unknown fields

    This enables:

    • Backward compatibility
    • Forward compatibility

    11. Safe vs Unsafe Changes

    Safe Changes

    • Adding new fields
    • Renaming fields
    • Adding enum values
    • Reordering fields in file

    Unsafe Changes

    • Changing field types
    • Reusing tag numbers
    • Changing tag numbers
    • Removing fields without reserving tags

    12. Practical Guidelines

    Always follow these rules in production:

    • Never change field numbers
    • Never change field types
    • Prefer adding new fields instead of modifying old ones
    • Use new tags for new fields
    • Keep old fields for compatibility
    • Reserve removed field numbers

    13. Version 3 (V3) — Removing a Field

    Suppose the requirement says:

    We no longer receive model/year from a third-party service, so we must remove it.

    So V3 becomes:

    syntax = "proto3";
    
    package section05.v3;
    
    option java_package = "com.example.section05.v3";
    option java_multiple_files = true;
    
    message Television {
    
      enum TvType {
        HD = 0;
        UHD = 1;
        OLED = 2;
      }
    
      string brand = 1;
      TvType type = 3;
    }
    

    Field 2 (model/year) is removed.

    14. What Happens to Older Clients?

    V1 Client Receiving V3 Data

    V1 expects:

    string brand = 1;
    int32 year = 2;
    

    But V3 does not send tag 2.

    So V1 sees:

    • brand → correct
    • year → default value 0

    Nothing breaks. Missing fields simply use defaults.

    15. Default Values in Proto3

    Proto3 automatically assigns defaults:

    • int32 → 0
    • bool → false
    • string → empty
    • enum → first value (usually 0)

    So when a field is absent:

    • It does NOT throw an error
    • It silently uses default

    This is a major reason Protobuf supports compatibility.

    16. Version 4 (V4) — A Common Mistake

    A new developer joins and adds price:

    int32 price = 2;  // ❌ BAD
    

    Why this is dangerous:

    • Tag 2 used to mean year/model
    • Old clients will decode price as year

    So:

    • price = 50000
    • V1 sees year = 50000

    This creates semantic corruption. No crash happens, but the data becomes wrong. This is worse than a failure.

    17. Correct Approach — Reserving Removed Fields

    Whenever a field is removed, reserve its tag. Now tag 2 is permanently blocked. Future developers cannot reuse it accidentally.

    message Television {
    
      reserved 2;
    
      enum TvType {
        HD = 0;
        UHD = 1;
        OLED = 2;
      }
    
      string brand = 1;
      TvType type = 3;
    }
    

    18. Reserving Field Names Too

    You can also reserve names:

    reserved "year", "model";
    

    This prevents reuse of old names. Helpful for large teams and long-lived APIs.

    19. Adding Price Correctly

    Instead of reusing 2:

    int32 price = 4;  // ✅ Correct
    

    Now:

    • Old clients ignore it
    • No confusion
    • No data corruption

    20. Handling Default Value Ambiguity

    Sometimes 0 is not acceptable as a default.

    Example:

    • price = 0 → Is it free or unset?

    Two solutions:

    Option 1 — Wrapper Types

    google.protobuf.Int32Value price = 4;
    

    Allows:

    • hasPrice()

    Option 2 — optional

    optional int32 price = 4;
    

    Allows:

    • hasPrice()
    • No wrapper overhead

    21. Golden Rules of Schema Evolution

    Always follow these rules in production systems:

    • Never change field numbers
    • Never reuse old tags
    • Always reserve removed fields
    • Renaming fields is safe
    • Changing types is unsafe
    • Adding new fields is safe
    • Missing fields use defaults
    • Unknown fields are ignored