Types of Sampling in OpenTelemetry

1. Why Sampling Exists in the First Place

When engineers first learn distributed tracing, the natural instinct is to say, “Let us trace everything, because more data is always better.” In a small development environment this approach works perfectly fine, and in fact it is strongly recommended because it helps you understand how your system behaves and how traces are structured. However, the moment you move to a real production system, especially in an organization like a bank or a large enterprise where traffic can reach thousands or even tens of thousands of requests per second, this idea immediately runs into very practical and unavoidable limits.

Every single request in a modern microservice system does not create just one span. Instead, it often creates tens of spans: one for the incoming HTTP request, several for database calls, a few for outgoing REST calls, some for Kafka producers or consumers, and often a few custom spans for business logic. If you multiply this by thousands of requests per second, you very quickly reach a scale of millions of spans per minute. Storing, indexing, transferring, and visualizing this volume of data is not only expensive, but also operationally painful and often unnecessary.

This is why sampling exists. Sampling is not a feature added for convenience; it is a fundamental design tool that allows you to control the volume of telemetry data while still preserving enough information to understand system behavior, performance bottlenecks, and failure patterns.

In simple terms, sampling answers one very important question:

Out of all the requests flowing through my system, which ones should I record and store as traces, and which ones should I ignore?

2. The Most Important Concept to Understand About Sampling

One of the most critical mental models you must internalize is that sampling is usually a decision that applies to an entire trace, not to individual spans in isolation. In other words, for a given request, the system generally decides either “this whole trace will be recorded” or “this whole trace will be dropped.” The reason for this is very practical: a trace that contains only half of the spans is usually confusing and sometimes even misleading, because the missing parts hide the real cause of latency or failure.

So, when we talk about sampling, we are almost always talking about deciding the fate of an entire trace, starting from its root span.

3. Where and When Can Sampling Happen?

In OpenTelemetry and in distributed tracing systems in general, sampling decisions can be made in two fundamentally different phases of a trace’s life:

At the beginning of the trace, when the root span is created. This is called head-based sampling.
At the end of the trace, after all spans have been collected and the system can see the complete picture. This is called tail-based sampling.

These two approaches represent two very different philosophies, and they come with very different trade-offs in terms of cost, complexity, and quality of data.

4. Head-Based Sampling: Deciding at the Start

Head-based sampling is by far the most common form of sampling, and it is the default in most OpenTelemetry SDKs.

The idea is simple: when a request arrives and the root span is about to be created, the sampler immediately decides whether this trace should be recorded or not. If the decision is “yes,” then all spans created under this root span are recorded and eventually exported. If the decision is “no,” then the trace is effectively ignored, and almost no spans are recorded.

This approach is extremely fast and efficient, because it does not require any buffering or waiting. The system does not need to keep spans in memory while waiting to see how the trace turns out. The decision is cheap, immediate, and final.

This is also why head-based sampling is so widely used in high-throughput systems.

5. Different Types of Head-Based Sampling

Let us now go through the most important types of head-based sampling, one by one, and understand not just what they do, but also when they make sense in real systems.

5.1 Always-On Sampling

Always-on sampling means exactly what the name suggests: every single trace is recorded and exported. Nothing is dropped.

This mode is extremely useful in development environments, test environments, and during the early stages of learning OpenTelemetry, because it gives you perfect visibility into what your application is doing. You never have to worry about whether a particular request was sampled or not, because everything is always there.

However, in a real production system with any meaningful amount of traffic, always-on sampling becomes impractical very quickly. The volume of data grows too large, the cost becomes too high, and the tracing backend becomes overloaded with more information than humans can realistically analyze.

So the correct mental model is this:

Always-on sampling is a learning and debugging tool, not a production strategy for high-traffic systems.

Following is the configuration:

otel.traces.sampler=always_on

5.2 Always-Off Sampling

Always-off sampling is the exact opposite. In this mode, no traces are recorded at all.

At first glance, this might sound useless, and in most cases it is. However, it does have some niche uses, for example when you want to temporarily disable tracing without removing instrumentation, or when you want to measure the absolute minimum overhead of your application without any tracing data being produced.

For normal observability purposes, you will almost never use always-off sampling.

Following is the configuration:

otel.traces.sampler=always_off

5.3 Probability (Ratio-Based) Sampling

This is the most commonly used sampling strategy in production systems.

In probability-based sampling, you specify a ratio such as 0.1, 0.01, or 0.001. This means:

With a ratio of 0.1, approximately 10% of traces are recorded.
With a ratio of 0.01, approximately 1% of traces are recorded.
With a ratio of 0.001, approximately 0.1% of traces are recorded.

The sampler uses a random (but deterministic) decision process to decide, for each new trace, whether it should be kept or dropped.

Over time and over large traffic volumes, this gives you a statistically representative sample of your traffic. You do not see every request, but you do see enough requests to understand general performance patterns, typical latencies, and common call paths.

This is usually a very good trade-off between cost and visibility.

However, it has one important weakness: rare and important events, such as errors that happen only once in a thousand requests, might be missed entirely if your sampling ratio is too low.

Following is the configuration:

otel.traces.sampler=traceidratio
otel.traces.sampler.arg=0.2

5.4 Parent-Based Sampling

Parent-based sampling is not a sampling strategy by itself, but rather a rule about how sampling decisions propagate across service boundaries.

In a distributed system, one service often calls another service. The upstream service makes a sampling decision and sends that decision along with the trace context in HTTP headers or message headers. The downstream service then usually follows this rule:

If the parent span was sampled, the child span should also be sampled. If the parent span was not sampled, the child span should also not be sampled.

This ensures that you do not end up with half-traces, where one service recorded data but another service did not. In practice, most real-world configurations use something like:

Parent-based + probability sampler

Which means:

If there is already a parent decision, follow it.
If this is a new root trace, apply probability sampling.

6. The Big Limitation of Head-Based Sampling

The fundamental limitation of head-based sampling is that it makes the decision before it knows what kind of trace this will be.

At the start of a request, you do not yet know:

Whether this request will be slow or fast.
Whether it will fail or succeed.
Whether it will trigger some rare and important edge case.

So head-based sampling is always making a somewhat blind decision.

This is where tail-based sampling comes in.

7. Tail-Based Sampling: Deciding at the End

Tail-based sampling works in the opposite way. Instead of deciding at the start, the system first collects all spans of a trace, keeps them in memory or in a buffer, and only after the trace is complete does it decide whether to keep or drop it.

Because the system now sees the full trace, it can make much smarter decisions, such as:

Keep all traces that contain an error.
Keep all traces that took longer than 2 seconds.
Keep all traces that touch a particular critical service.
For the rest, only keep a small percentage.

This approach is extremely powerful, because it allows you to say:

I do not care about most normal, fast, successful requests, but I care deeply about slow and failing ones, and I want to keep all of those.

8. The Cost and Complexity of Tail-Based Sampling

Tail-based sampling is not free. In order to work, the system must:

Buffer spans in memory or on disk.
Wait until a trace is complete or times out.
Then apply sampling rules.
Then either export or discard the trace.

This means:

More memory usage
More CPU usage
More operational complexity
Usually a need for a special component such as the OpenTelemetry Collector or a backend that supports tail-based sampling

So the correct way to think about tail-based sampling is this:

Tail-based sampling gives you much higher quality data, but at a significantly higher operational cost.

9. A Very Common Real-World Strategy

In many real production systems, teams use a combination of strategies:

Use head-based probability sampling at the SDK level to keep the raw data volume manageable.
Then use tail-based sampling in the collector or backend to apply smarter rules like “keep all errors” and “keep all slow traces.”

This gives a good balance between cost control and diagnostic power.

10. Examples

Tail sampling based on status code

# otel collector configuration example: https://opentelemetry.io/docs/collector/configuration/
# tail sampling configuration: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor

receivers:
 otlp:
   protocols:
     grpc:
       endpoint: ":4317"
     http:
       endpoint: ":4318"

exporters:
  otlp/tempo:
    endpoint: "tempo:4317"
    tls:
      insecure: true

processors:
 tail_sampling:
   # how long to wait for a trace to complete before making a sampling decision.
   decision_wait: 10s
   # max number of traces to keep in memory while waiting for completion.
   num_traces: 100000
   # sampling policies
   policies:
     - name: errors-policy
       type: status_code
       status_code:
         status_codes: [ERROR]

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp/tempo]

Tail sampling based on latency

# otel collector configuration example: https://opentelemetry.io/docs/collector/configuration/
# tail sampling configuration: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ":4317"
      http:
        endpoint: ":4318"

exporters:
  otlp/tempo:
    endpoint: "tempo:4317"
    tls:
      insecure: true

processors:
  tail_sampling:
    # how long to wait for a trace to complete before making a sampling decision.
    decision_wait: 10s
    # max number of traces to keep in memory while waiting for completion.
    num_traces: 100000
    # sampling policies
    policies:
      - name: latency-policy
        type: latency
        latency:
          threshold_ms: 1500      

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp/tempo]

Tail sampling based on attributes

# otel collector configuration example: https://opentelemetry.io/docs/collector/configuration/
# tail sampling configuration: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: ":4317"
      http:
        endpoint: ":4318"

exporters:
  otlp/tempo:
    endpoint: "tempo:4317"
    tls:
      insecure: true

processors:
  tail_sampling:
    # how long to wait for a trace to complete before making a sampling decision.
    decision_wait: 10s
    # max number of traces to keep in memory while waiting for completion.
    num_traces: 100000
    # sampling policies
    policies:
      - name: attribute-policy
        type: string_attribute
        string_attribute:
          key: "url.path"
          values: ["/api/movies/2"]

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [otlp/tempo]