Learnitweb

Protobuf Serialization and Deserialization in Java

1. Introduction

When we use Protocol Buffers in real applications, defining messages is only the first step, because the true power of Protobuf appears when we serialize those messages into a compact binary form for transmission or storage, and later deserialize them back into objects. In this tutorial, we will focus on the essential ideas behind serialization and deserialization in Protobuf, how the generated Java code makes this process surprisingly simple, and what this tells us about Protobuf’s design philosophy.

Rather than mechanically repeating steps, we will extract the important concepts and build a clear mental model that helps you use Protobuf confidently in real systems.

2. What Serialization and Deserialization Really Mean

Serialization is the process of converting an in-memory object into a format that can be stored or transmitted, while deserialization is the reverse process of reconstructing the object from that format. In distributed systems, this conversion is unavoidable because data must cross process and network boundaries, and Protobuf was specifically designed to make this conversion fast, compact, and reliable.

Unlike JSON or XML, which are text-based and human-readable, Protobuf uses a binary format that is optimized for machines, which means smaller payloads and faster parsing at the cost of readability.

3. Why Protobuf Serialization Feels “Built-In”

One of the most elegant aspects of Protobuf in Java is that serialization and deserialization are not something you manually implement, because the Protobuf compiler generates classes that already know how to encode and decode themselves.

Every generated message class extends an internal base class (GeneratedMessageV3 in Java), and this base class provides rich functionality such as:

  • Binary serialization
  • Binary deserialization
  • Size calculation
  • Equality checks
  • Many utility methods

Because of this inheritance, your message objects come with serialization capabilities “for free,” which explains why you see many methods on generated classes beyond the getters for your fields.

4. Serializing a Protobuf Message to a File

Suppose we already created a Person object using a builder. Writing it to a file is straightforward because the message itself knows how to write its binary form to an output stream.

Example:

private static final Path PATH = Path.of("person.out");

public static void serialize(Person person) throws IOException {
    try (OutputStream os = Files.newOutputStream(PATH)) {
        person.writeTo(os);
    }
}

The important idea here is that:

  • You do not manually convert fields to bytes
  • You do not manage encoding logic
  • You simply provide an output stream

The message handles the rest using Protobuf’s binary format.

5. Deserializing a Protobuf Message from a File

Deserialization is equally simple because the generated class provides parseFrom methods that know how to reconstruct the message from binary data.

Example:

public static Person deserialize() throws IOException {
    try (InputStream is = Files.newInputStream(PATH)) {
        return Person.parseFrom(is);
    }
}

Here, Protobuf reads the binary data, understands the field numbers and types, and rebuilds the object exactly as it was during serialization.

The key takeaway is that deserialization logic is schema-aware, meaning it depends on the .proto definition, not on guesswork or reflection.

6. Verifying Correctness with Equality

A good practice when learning serialization is to verify that the original and deserialized objects are equal.

Example:

Person p1 = createPerson();
serialize(p1);

Person p2 = deserialize();

System.out.println(p1.equals(p2));

If everything works correctly, this prints true, because Protobuf equality is value-based, and both objects contain the same field values.

This simple test builds confidence that:

  • No data was lost
  • Encoding and decoding were consistent
  • The schema matches on both ends

7. Understanding the Output File

If you open the generated file (e.g., person.out), you will notice that it is not human-readable, and this is completely intentional. Protobuf’s binary format is designed for efficiency, not readability, and it stores data using compact encodings and numeric field identifiers rather than textual names.

This design leads to:

  • Smaller payload sizes
  • Faster parsing
  • Lower bandwidth usage

In contrast, JSON prioritizes readability but produces larger payloads and slower parsing.

8. Getting the Raw Byte Array Directly

You are not required to use files at all, because Protobuf also allows you to get the serialized bytes directly in memory.

Example:

byte[] bytes = person.toByteArray();
System.out.println(bytes.length);

This is extremely useful when:

  • Sending messages over the network
  • Publishing to Kafka or message queues
  • Storing blobs in databases
  • Measuring payload sizes

The byte array represents the exact binary form that would be written to a stream.