Kong Observability With Grafana: A Unified View for Logs, Metrics, and Traces

Introduction

Understanding the internal state of your application through telemetry data, traces, metrics, and logs, is crucial. Whether you need to monitor your application’s performance at a high level or debug a specific issue, observability plays a vital role.

Observability is so important that it should be a top priority before deploying the application itself. The three pillars of observability are logs, metrics, and traces:

  • Metrics provide an overview of your application’s performance, allowing you to build dashboards to track trends.
  • Logs offer detailed data for individual requests.
  • Traces give insights into the lifecycle of requests, helping you identify issues.

Now that we understand the importance of an observability, the challenge is to find a solution that:

  1. Provides comprehensive data for system monitoring.
  2. Works with various backends to avoid vendor lock-in.
  3. Offers user-friendly tools for implementation.

In today’s post, let’s explore how to achieve these goals for Kong Gateway.

Solution

Data Format

To work with different backends, our logs, metrics, and traces need to be in a standardized format. This is where OpenTelemetry comes in.

What is OpenTelemetry?

OpenTelemetry is an Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs. Crucially, OpenTelemetry is vendor- and tool-agnostic, meaning that it can be used with a broad variety of Observability backends, including open source tools like Jaeger and Prometheus, as well as commercial offerings.

OpenTelemetry is not an observability backend like Jaeger, Prometheus, or other commercial vendors. OpenTelemetry is focused on the generation, collection, management, and export of telemetry. A major goal of OpenTelemetry is that you can easily instrument your applications or systems, no matter their language, infrastructure, or runtime environment. Crucially, the storage and visualization of telemetry is intentionally left to other tools.

Thanks to its vendor- and tool-agnostic nature, and widespread support from various vendors, OpenTelemetry is an excellent choise for our solution.

Data Collection

Traditional data collection methods often require shipping data to different endpoints or exposing various ports for different backend to scrape the data. One way OpenTelemetry simplifies this process is to utilize the OpenTelemetry Collector, which acts as a central hub for receiving, processing, and exporting data to your chosen backend. This centralization allows for easier data management and supports both push and pull data from your applications. By using the OpenTelemetry Collector, you can streamline your data collection process, simplified access control, ensuring consistency and flexibility in your observability stack.

The workflow of the OpenTelemetry Collector as follows:

Monitoring Kong

We will also use the OpenTelemetry Collector for data collection from Kong. Since Kong is the API Gateway, it has many application running behing it. Both Kong and some applications will send their telemetry data to the OpenTelemetry Collector. The collector then forwards this data to supported backends. The goal here is to also correlate logs with traces.

High-Level Overview

With an understanding of the data formats, collection methods, and data shipping processes, let’s explore how to gather each type of data from Kong.

Grafana Dashboard

Grafana is chosen for two primary reasons:

  1. It supports various backends like Jaeger and Prometheus.
  2. Kong has an official Grafana dashboard that we can leverage.

Logs Collection

Logs Collection Solution

Since we are using Grafana, Loki is a natural choice for log storage. Kong produces two types of logs:

While a log collector like Promtail or Fluent Bit can ship access logs to Loki, these logs often lack detailed information and it is difficult to inject trace_id. Fortunately, Kong’s OpenTelemetry plugin injects trace_id into logging plugin outputs, which means we can leverage logging plugins to produce logs that can be correlated with traces.

Logging Plugin

All Kong logging plugins output logs in a consistent JSON format, with each request logged as a separate JSON object. (You can check this JSON format in the official documentation.) When configured with the OpenTelemetry plugin, the trace ID of each request is added as the trace_id key in the log output.

I initially decided to use the http-log plugin to send logs directly to the OpenTelemetry Collector, but I encountered a couple of challenges:

  1. The logs sent to the OpenTelemetry Collector (OTLP receiver) must be in OTLP format.

  2. According to the OpenTelemetry log data model, event records must include TraceID and SpanID fields to correlate logs with traces. However, the logging plugin’s JSON output looks like this, and there doesn’t seem to be a way for the plugin to add these fields directly:

    1
    2
    3
    "trace_id": {
    "w3c": "f42537d72e4e6c83d7cadbbaae704485"
    }

    Therefore we have two problems to solve:

  3. Convert logging output from JSON to OTLP format.

  4. Map the trace_id in the output to the TraceID field in the log event.

Fluent Bit

To address the first issue, I introduced Fluent Bit to receive the JSON logs, convert them to OTLP format, and then send them to the OpenTelemetry Collector. This is straightforward by enabling Fluent Bit’s HTTP input on port 8080 for the http-log plugin to send logs to.

For the second problem, we know that the Kong OpenTelemetry plugin also injects the following request header:

1
2
3
4
5
"request": {
"headers": {
"traceparent": "00-f42537d72e4e6c83d7cadbbaae704485-3334a4a3481e7a90-01"
}
}

According to the W3C Trace Context, the traceparent header format is version-trace_id-span_id-trace_flags. My solution is to use Kong’s config.custom_fields_by_lua in the logging plugin to read this header and inject traceid and spanid back into the log. Then, Fluent Bit’s OpenTelemetry output can read traceId and spanId from the message body to set the TraceID and SpanID fields for each log event.

Here is the http-log plugin configuration:

1
2
3
4
5
6
7
8
9
10
- name: http-log
config:
custom_fields_by_lua:
traceid: |
local h = kong.request.get_header('traceparent')
return h:match("%-([a-f0-9]+)%-[a-f0-9]+%-")
spanid: |
local h = kong.request.get_header('traceparent')
return h:match("%-[a-f0-9]+%-([a-f0-9]+)%-")
http_endpoint: http://fluentbit:8080

Metrics Collection

Metrics Collection Solution

Next, we need to address the collection of metrics data. We will use Prometheus as the data backend because Kong has a Prometheus plugin that generates Prometheus metrics and it provides a dashboard for Grafana.

In this setup, the OpenTelemetry Collector will scrape metrics from the Kong container and itself. Then, the Prometheus server will scrape the OpenTelemetry Collector to gather all metrics.

The design behind this approach includes a couple advantages:

  1. Simplified Access Control: Especially in a service mesh environment, this setup allows Prometheus to scrape all metrics from the OpenTelemetry Collector without the need to grant access to each individual application.
  2. Monitoring the OpenTelemetry Collector: By also collecting OpenTelemetry Collector metrics, you can monitor its performance using a dashboard. For more information on configuration opentelemetry collector metrics, please refer to this documentation.

The solution involves using the OpenTelemetry Collector’s Prometheus receiver to scrape metrics from Kong and itself. These metrics are then exposed via the Prometheus exporter on port 9090. The Prometheus server simply scrapes the OpenTelemetry Collector on port 9090 to retrieve all metrics.

Traces Collection

Traces Collection Solution

The final piece of data collection involves shipping traces to the OpenTelemetry Collector. This process is straightforward because Kong Gateway has an OpenTelemetry plugin that supports sending traces to an HTTP backend. In this design, we use Jaeger as the backend.

For access control reasons, instead of sending traces directly to Jaeger, Kong and any other applications generating traces will sends traces to the OpenTelemetry Collector, which then forwards the data to Jaeger.

Combining the data collection of logs, metrics, and traces, we have our complete solution demonstrated in the following diagram:

Complete Observability Solution

Demo

I’ve prepare demo for you to get a better understanding on how this solution works. You can access the source code at this repo. Once all pods/contrianers are running, we can access the hotrod application on a browser at http://localhost:8000/. Clicking one of the button to generate some traces, then we can go to grafana at localhost:3000. We should see log correlates with traces and you can find the trace by simply click the Jaeger button.

That’s all I want to share with you today, see you next time.