Skip to main content

Protocol Buffers: Define Services Step-by-Step

Protocol Buffers (protobuf) are Google's language-neutral, platform-neutral method for serializing structured data. Unlike JSON or XML, protobufs compile to compact binary (3–5 KB average vs 8–15 KB JSON), with zero deserialization ambiguity. Proto3, the current standard, supports 20+ languages and enforces strict backward compatibility: you can add fields without breaking old clients. For gRPC, Protocol Buffers are not optional—they define your entire service contract: messages, RPC method signatures, error codes, and field versioning rules that prevent production incidents.

This guide teaches you to write, version, and evolve .proto files that serve as the source-of-truth for your microservices.

Proto3 Syntax Fundamentals

A .proto file is plain text with three main components: a package declaration, message definitions, and service definitions.

syntax = "proto3";

// Package name prevents naming conflicts across projects
package ecommerce.orders;

// Scalar message fields: type name = field_number
message Order {
string order_id = 1; // string field, field number 1
int32 quantity = 2; // 32-bit integer
double price = 3; // floating-point
bool is_expedited = 4; // boolean
bytes metadata = 5; // raw bytes for binary data
}

// Service definition: group of RPC methods
service OrderService {
rpc CreateOrder (Order) returns (OrderResponse) {}
rpc GetOrder (OrderID) returns (Order) {}
}

// Request/response messages
message OrderID {
string id = 1;
}

message OrderResponse {
string status = 1;
int32 order_number = 2;
}

Field Numbers Matter: Each field in a message has a unique number (1–9 reserved for frequently-used fields, 10+ for optional extensions). Field numbers identify fields during deserialization; if you rename a field but keep its number, old and new code interoperate. This is backward compatibility.

Scalar Types and Defaults

Proto3 supports these scalar types:

TypePython TypeDefaultUse Case
boolboolFalseFlags, toggles
int32int0Small counts, IDs (within 2 billion)
int64int0Timestamps (Unix nanoseconds), large IDs
float / doublefloat0.0Prices, measurements (avoid for money—use int64 cents)
stringstr""Text, names, URLs
bytesbytesb""Binary blobs, encoded images

Repeated Fields and Collections

Use repeated for lists (maps require a workaround):

message OrderBatch {
string batch_id = 1;
repeated Order orders = 2; // List of orders
}

// Maps (key-value pairs) require a special syntax
message OrderMetrics {
map<string, int32> order_count_by_region = 1; // region -> count
}

In Python, repeated fields become lists. Empty lists serialize to nothing (not included in binary), saving space.

# Generated Python code
batch = OrderBatch(batch_id="b123", orders=[order1, order2])
print(len(batch.orders)) # 2

Enums for Status and Categories

Enums enforce a fixed set of values and are faster to transmit than strings:

enum OrderStatus {
STATUS_UNSPECIFIED = 0; // Proto3 requires a 0 value for default
STATUS_PENDING = 1;
STATUS_CONFIRMED = 2;
STATUS_SHIPPED = 3;
STATUS_DELIVERED = 4;
STATUS_CANCELLED = 5;
}

message Order {
string order_id = 1;
OrderStatus status = 2;
int64 created_at_unix_ns = 3; // Nanoseconds since epoch
}

Enum fields serialize as integers (1–5 bytes), much smaller than string enums. In Python, enums become integer constants (or use IntEnum for convenience).

Nested Messages for Logical Grouping

Nest messages to group related types and improve readability:

message Order {
string order_id = 1;

// Nested message: ShippingAddress
message ShippingAddress {
string street = 1;
string city = 2;
string zip_code = 3;
}

ShippingAddress address = 2; // Use nested type directly
repeated Item items = 3;

message Item {
string sku = 1;
int32 quantity = 2;
double price = 3;
}
}

In Python:

order = Order(
order_id="ORD-123",
address=Order.ShippingAddress(street="123 Main St", city="Portland"),
items=[Order.Item(sku="SKU-001", quantity=2, price=19.99)]
)

Versioning and Backward Compatibility

Proto3 enforces strict backward compatibility. When you evolve a proto:

  1. Never reuse field numbers. If you remove a field, mark it reserved:
message Order {
string order_id = 1;
OrderStatus status = 2;
int64 created_at_unix_ns = 3;
// Removed payment_method = 4 (never reuse 4)
reserved 4;
string notes = 5;
}
  1. Always add new fields at the end. Old clients ignore unknown fields automatically.
// Version 1
message Order {
string order_id = 1;
OrderStatus status = 2;
}

// Version 2 (safe—old clients won't break)
message Order {
string order_id = 1;
OrderStatus status = 2;
int64 created_at_unix_ns = 3; // New field, old clients skip it
}
  1. Never change a field's type. int32 quantity becomes string quantity = deserialization failure.

  2. Enums: safe to extend. Add new enum values at the end. Old clients treat unknown values as the 0 default.

Service Definitions and RPC Methods

Define your microservice's API with an RPC, specifying the request and response messages:

service OrderService {
// Unary RPC: one request, one response
rpc CreateOrder (Order) returns (OrderResponse) {}

// Server streaming: one request, stream of responses
rpc GetOrderHistory (UserID) returns (stream OrderEvent) {}

// Client streaming: stream of requests, one response
rpc ProcessOrders (stream Order) returns (BatchProcessResult) {}

// Bidirectional streaming: stream in, stream out
rpc SubscribeToOrderUpdates (stream OrderID)
returns (stream OrderUpdate) {}
}

Each RPC maps to a Python method on the server servicer class and a stub method on the client. The stream keyword indicates a bi-directional or one-way stream (clarified via RPC type).

Organizing Large Projects: Package and Import

For large systems, split protos across multiple files and use import:

// common/types.proto
syntax = "proto3";
package ecommerce.common;

message Money {
int64 amount_cents = 1;
string currency = 2; // "USD", "EUR"
}

enum Country {
COUNTRY_UNSPECIFIED = 0;
COUNTRY_US = 1;
COUNTRY_CA = 2;
}
// orders/service.proto
syntax = "proto3";
package ecommerce.orders;

import "common/types.proto";

message Order {
string order_id = 1;
ecommerce.common.Money total = 2;
ecommerce.common.Country delivery_country = 3;
}

In Python, the generated code uses nested module imports: ecommerce.common.Money.

Code-Generation Best Practices

Store .proto files in a dedicated directory, version control them, and regenerate stubs whenever you update a proto:

# Directory structure
project/
├── protos/
│ ├── common/
│ │ └── types.proto
│ └── orders/
│ └── service.proto
├── generated/
│ └── # (generated Python stubs go here)
└── services/
└── order_service.py
# Batch-compile all protos
find protos -name "*.proto" -print0 | \
xargs -0 python -m grpc_tools.protoc \
-I protos \
--python_out=generated \
--grpc_python_out=generated

Always commit .proto files; never commit generated *_pb2.py or *_pb2_grpc.py files. Generated code is deterministic and should be regenerated at build time.

Key Takeaways

  • Proto3 is the lingua franca of gRPC: every service is defined in .proto files that compile to type-safe stubs in any language.
  • Backward compatibility is enforced: never reuse field numbers, never change field types, and always add new fields at the end.
  • Enums and nested messages keep your contract readable and hierarchical.
  • Field numbers (1–5) uniquely identify serialized data; renaming a field without changing its number preserves interoperability with old clients.
  • Imports and packages keep large projects organized; regenerate stubs at build time via version-controlled proto files.

Frequently Asked Questions

Can I use JSON instead of Protocol Buffers in gRPC?

Technically yes, but you lose the performance and type-safety guarantees that make gRPC valuable. Some frameworks support gRPC-JSON transcoding (HTTP/1.1 REST exposed by gRPC backends), but the service-to-service protocol should always be protobuf. Stay with proto3.

What's the difference between proto2 and proto3?

Proto2 (2008) allowed optional fields and default values; proto3 (2016) removed optional to simplify backward compatibility. Proto3 is the modern standard for new code. Proto2 is legacy; avoid it unless maintaining old systems. Use syntax = "proto3" in new files.

How do I handle null/optional fields in proto3?

Proto3 has no null concept—all fields have defaults (0, empty string, empty list). For true optionality, use a wrapper message (e.g., google.protobuf.StringValue) or a presence field: google.protobuf.FieldMask. For most cases, defaults are sufficient. In Python 3.12+, you can enable optional fields with edition = "2023" syntax.

How large can a protobuf message be?

Default max message size is 4 MB (Java) or no limit (Python). For larger data, stream the message in chunks using streaming RPCs. Never send single messages larger than 100 MB; use server streaming instead.

Can I validate proto messages (e.g., email format, numeric ranges)?

Protobuf itself has no validation syntax. Validation happens in your server code or via wrapper libraries (e.g., protoc-gen-validate). For simple cases, add comments documenting expectations; for complex rules, implement server-side validation logic.

Further Reading