Learnitweb

What Is OpenTelemetry and Why Do We Need It?

In this tutorial, we are going to understand what OpenTelemetry is, but instead of starting with a formal definition, it is much more useful to first understand the problem that OpenTelemetry is trying to solve. Once the problem becomes clear, the purpose and value of OpenTelemetry will feel very natural and obvious.

The Problem: Modern Systems Are Complex and Distributed

Modern applications are no longer simple monoliths that run as a single process written in a single programming language. Instead, most real-world systems today are:

  • Highly distributed and made of many microservices.
    In a large organization, it is very common to have hundreds or even thousands of microservices, all talking to each other over the network, and together forming one big logical system.
  • Polyglot, meaning written in multiple programming languages.
    One service might be written in Java with Spring Boot, another might be written in Python, another in Go, and yet another in Node.js. This is completely normal today, but it also makes tooling and standardization much harder.

To understand what is really happening inside such a system, we need telemetry data, namely:

  • Logs, to know what events and errors occurred
  • Metrics, to understand trends and overall health
  • Traces, to understand how requests flow across services

So far, this part is not new. The real problem starts when we look at how this telemetry data is collected and sent to observability tools.

The Old World: Every Tool Had Its Own SDK and Format

Today, and especially in the past, there have been many popular observability and monitoring tools, such as:

  • Prometheus
  • InfluxDB
  • Elasticsearch
  • Jaeger, Zipkin, Tempo, and many others

The problem is not that these tools exist. The problem is that:

  • Each tool or vendor provides its own SDK and its own data format.
    This means that if your application wants to send metrics or traces to Prometheus, you need to use the Prometheus-specific libraries. If you want to send data to another backend, you need a different set of libraries and often a different data model.
  • Switching from one observability backend to another is painful and expensive.
    This is not like switching from MySQL to PostgreSQL, where you can often keep most of your code and just change the driver. In the observability world, the instrumentation code itself changes, the dependencies change, and sometimes even the way you think about the data changes.
  • In large microservice systems, this becomes a nightmare to maintain.
    Imagine having hundreds of services, all directly integrated with a specific vendor’s SDK. If tomorrow your organization decides to move from one vendor to another, you are suddenly looking at hundreds of code changes across many repositories.

In short, observability was fragmented, vendor-specific, and expensive to evolve.

The Core Idea of OpenTelemetry

This is exactly the problem that OpenTelemetry was created to solve.

To understand the idea, let us use a very familiar analogy from the Java world.

  • As a Java developer, you already know what JPA (Java Persistence API) is.
    JPA provides a standard API to interact with relational databases, and because of that, your application is not tightly coupled to a specific database vendor like MySQL or PostgreSQL.
  • In a very similar way, OpenTelemetry (often called Otel) provides a standard for telemetry data.
    It defines standard APIs, SDKs, and data models for logs, metrics, and traces, so that your application is not tightly coupled to any specific observability backend.

You can think of it like this:

JPA decouples your application from the database vendor.
OpenTelemetry decouples your application from the observability vendor.

Before OpenTelemetry: Tight Coupling Everywhere

Let us look at how things used to work before OpenTelemetry, in a simplified way.

  • Your application directly depends on the SDK of a specific observability tool.
  • If you want to send logs, metrics, and traces to, say, Backend A, you add Backend A’s dependencies and write code using Backend A’s APIs.
  • Now, if you decide to switch to Backend B:
    • You have to remove the old dependencies
    • Add new dependencies
    • Change the code
    • And possibly even change the data model, because the formats might be different

Now imagine this situation not for one service, but for hundreds of microservices. The migration effort becomes huge, risky, and extremely expensive.

With OpenTelemetry: A Clean and Flexible Architecture

With OpenTelemetry, the architecture looks very different and much cleaner.

  • Your application is instrumented using OpenTelemetry APIs and SDKs only.
    It does not know or care whether the data will finally end up in Prometheus, InfluxDB, Elasticsearch, Tempo, or any other backend.
  • The application sends telemetry data in a standard format to something called the OpenTelemetry Collector.
    We will talk about the collector in much more detail later in the course, but for now you can think of it as a smart, configurable pipeline for telemetry data.
  • The OpenTelemetry Collector is responsible for:
    • Receiving telemetry data from applications
    • Processing it if needed
    • Exporting it to one or more observability backends
  • If tomorrow you want to switch from one backend to another, or even send data to multiple backends at the same time, you usually only need to:
    • Change the collector configuration, not the application code.

So now:

Your application only speaks OpenTelemetry. The collector handles the rest.

This completely removes vendor lock-in from your application code.

A Simple Mental Picture

You can think of the flow like this:

  • Application → sends logs/metrics/traces in OpenTelemetry format
  • OpenTelemetry Collector → receives, processes, and routes the data
  • Observability backends → Prometheus, InfluxDB, Elasticsearch, Tempo, etc.

Your application does not need to know which backend is used. That decision becomes an operational configuration, not a code change.

Formal Definition of OpenTelemetry

Now that we understand the problem and the idea, the formal definition will make much more sense:

OpenTelemetry is an open-source, vendor-neutral observability framework that provides standard APIs, SDKs, and tools to generate, collect, and export telemetry data such as logs, metrics, and traces.

OpenTelemetry is:

An observability framework and toolkit designed to facilitate the generation, export, collection of telemetry data such as traces, metrics, and logs.

Open source, as well as vendor- and tool-agnostic, meaning that it can be used with a broad variety of observability backends, including open source tools like Jaeger and Prometheus, as well as commercial offerings. OpenTelemetry is not an observability backend itself.

A major goal of OpenTelemetry is to enable easy instrumentation of your applications and systems, regardless of the programming language, infrastructure, and runtime environments used.

The backend (storage) and the frontend (visualization) of telemetry data are intentionally left to other tools.

What is observability?

Observability is the ability to understand the internal state of a system by examining its outputs. In the context of software, this means being able to understand the internal state of a system by examining its telemetry data, which includes traces, metrics, and logs.

To make a system observable, it must be instrumented. That is, the code must emit traces, metrics, or logs. The instrumented data must then be sent to an observability backend.

Why OpenTelemetry?

With the rise of cloud computing, microservices architectures, and increasingly complex business requirements, the need for software and infrastructure observability is greater than ever.

OpenTelemetry satisfies the need for observability while following two key principles:

  1. You own the data that you generate. There’s no vendor lock-in.
  2. You only have to learn a single set of APIs and conventions.

Main OpenTelemetry components

OpenTelemetry consists of the following major components:

  • A specification for all components
  • A standard protocol that defines the shape of telemetry data
  • Semantic conventions that define a standard naming scheme for common telemetry data types
  • APIs that define how to generate telemetry data
  • Language SDKs that implement the specification, APIs, and export of telemetry data
  • A library ecosystem that implements instrumentation for common libraries and frameworks
  • Automatic instrumentation components that generate telemetry data without requiring code changes
  • The OpenTelemetry Collector, a proxy that receives, processes, and exports telemetry data
  • Various other tools, such as the OpenTelemetry Operator for Kubernetes, OpenTelemetry Helm Charts, and community assets for FaaS
  • OpenTelemetry is used by a wide variety of libraries, services and apps that have OpenTelemetry integrated to provide observability by default.

OpenTelemetry is supported by numerous vendors, many of whom provide commercial support for OpenTelemetry and contribute to the project directly.

Language Support

OpenTelemetry is not tied to one ecosystem. Today, it supports many popular programming languages, including:

  • Java
  • JavaScript
  • Go
  • Python
  • .NET
  • C++
  • And several others

This makes it especially suitable for polyglot microservice architectures, where different teams use different technologies.