Many technology teams struggle with integrating data and applications across a growing mix of infrastructure. On one end, data platform teams are building faster and better data pipelines using Apache Kafka for real-time data handling. On the other end, application platform teams are increasing feature velocity with containers, microservices, and Kubernetes.
As these two technology teams accelerate in isolation, integrating real-time data into applications is a painful bottleneck.
Existing integration approaches suffer from a number of velocity-sapping weaknesses, such as low-level programming, custom app extensions, and expensive external consultants. In a recent poll, 67 percent of DevOps professionals said they wait a month or longer to integrate services across infrastructures.
This paper describes a declarative API for processing Kafka messages in Kubernetes-based workloads.
It also allows for declaratively managing the production and consumption of Kafka messages to and from Kubernetes workloads.
TriggerMesh is a new breed of open source integration platform built in the cloud on Kubernetes and Knative. With TriggerMesh, you can declaratively design, deploy, and manage integrations as code.
This paper shows how TriggerMesh removes the speed bumps introduced by current integration approaches, allowing enterprises to realize the full velocity and resiliency advantages of Kafka and Kubernetes.
Section 1: Consuming Kafka messages in Kubernetes
Enterprises now deal with staggering amounts of data. Whether in a data lake or broken across numerous silos, they now face new challenges such as data freshness, complications in data capturing, or building new event-driven workflows. They are on the hunt for new, better methods of real-time data handling.
To achieve real-time data handling, most enterprises have turned to Apache Kafka—arguably the leader in message brokering and data streaming. Enterprises who have gone this route are also showing a growing use of Kubernetes to manage containerized workloads.
Combining these two powerful technologies presented a challenge of its own, which is why TriggerMesh has developed a declarative API to facilitate the ingestion and processing of Kafka messages. A declarative API allows developers to declare a desired outcome instead of executing statements one after another, such as with an imperative API. This approach allows developers to declare a Kafka source, such as a Kafka topic, and set the target for those messages in a Kubernetes manifest. Now Kafka consumers can be managed with a Kubernetes API and a DevOps mindset.
A declarative Kafka source
Knative is an enterprise-level open source solution for creating serverless applications on Kubernetes. The serverless approach to application development relies on event management and event-driven processes. Knative provides a Kubernetes API extension—a CustomResourceDefinition or CRD—called KafkaSource that defines how messages from Kafka are consumed. This means that developers do not have to write custom clients, but instead just a configuration file.
Hands on: Deploying a KafkaSource CRD
Here are example commands showing how quickly a developer can deploy the KafkaSource CRD to a knative-eventing namespace.
With one "apply", we have our CRD and our controller. You can examine the success of these commands on your Kubernetes deployment like this:
The last step is to define which Kafka topic that the application will use to receive and consume messages. This can be done with a Kubernetes manifest like this:
Note that in this example, we are consuming messages from a Kafka cluster running in the Confluent cloud. You can find the Kafka documentation here, or contact us at TriggerMesh and we'd be happy to support you.
To display incoming Kafka events, we can deploy a simple event sink like this:
The target for the Kafka messages is defined in the sink section of the KafkaSource manifest. Every message in the Kafka topic will get consumed and displayed as a CloudEvent, which will look something like this:
That's all there is to declaratively defining Kafka messages and sending them in CloudEvent format to a Kubernetes workload.
Section 2: Send messages to Kafka declaratively
In this section, we show how to do the opposite operation: sending messages to a Kafka topic declaratively. We will show you how to declaratively define your Kafka sources and sinks with Kubernetes manifests. This avoids the need to compile and upload jar files to configure Kafka connections.
Our goal is to write Kubernetes manifests to declare how we send and receive—or produce and consume—messages to a Kafka topic.
A KafkaSink produces an event to a Kafka topic and therefore is a Kafka source from the point of view of the Kafka cluster. The KafkaSource consumes an event from a Kafka topic and is a Kafka sink from the point of view of the Kafka cluster.
To accomplish this configuration, we install two controllers and two new CRDs: KafkaSink and Knative Eventing.
Shortly after running these commands, the pods will be up and running. Note that the example below also shows the controller for KafkaSource.
And with this, our new CRD KafkaSink is in place, as you can see here.
Seeing the KafkaSink in action
To see the KafkaSink in action, we need to set up an HTTP endpoint in our Kubernetes cluster where we can send CloudEvents. These CloudEvents will then be produced—or sent to—the Kafka cluster. Using the same information that allowed us to talk to the Kafka cluster in Section 1, we can set up a Kafka sink declaratively with this configuration:
Now we have an HTTP endpoint that acts as an HTTP proxy to a Kafka topic. This example shows how the KafkaSink should look:
We can now POST a CloudEvent to this endpoint and the Sink will produce the event to the topic. To do this, use curl at the command line to craft a CloudEvent by hand. The following is an example:
We can now see the incoming CloudEvent being received by our KafkaSource. It should look like this:
The flow configured in this section allows us to produce events anywhere as CloudEvents, send them to a Kafka topic, and then consume the events by sending them to a Kubernetes workload. As an added bonus, all of this configuration has been defined declaratively.
Section 3: Configuring Kafka sources and sinks in Kubernetes
This section introduces a declarative method for configuring Kafka sources and sinks by incorporating third party event sources and sinks. Streaming data into and out of a Kafka cluster is usually done with a Kafka Connect cluster. However, this configuration process has proven to be a pain point for developers.
It involves a number of complicated steps, such as:
- Deploying and running a Kafka Connect cluster
- Updating and compiling connectors in Java
- Uploading JARs to specific directories in the Kafka Connect cluster
While there are Kafka Connectors available to self-managed Confluent cloud users, they still leave a lot of work for Kafka users who wish to manage their sources and sinks effectively. This section will demonstrate how to declaratively define and manage Kafka sources and sinks in Kubernetes clusters.
This approach allows users to utilize event sources from any third-party service, including GitHub, GitLab, AWS SQS, and Google Storage. These events can be consumed as Kafka messages and sent to any other third-party sink, such AWS Lambda, Azure Storage, and ElasticSearch.
Instead of a complex workflow compiling JARs and uploading them to a Kafka Connect cluster, this approach can leverage your Kubernetes cluster and define your Kafka sources and sinks with Kubernetes objects.
This brings a declarative mindset to your Kafka sources and sinks, declares a single source of truth, and unifies the management of Kafka flows and microservices. This shows it is possible to simplify the targeting of microservices running in Kubernetes when Kafka messages are present.
Configuring a Kafka source
Building on the examples in the previous sections, our next step is to create an addressable endpoint in our Kubernetes cluster. This endpoint will be the source of messages into our Kafka cluster. It is created using a configuration like the following.
Note that this configuration specifies both a bootstrap server on the Confluent cloud and a specific topic name.
Once the KafkaSink is defined, we can now create an AWS SQS source that will send messages directly into Kafka. This configuration requires the AWS SQS source's ARN and a Kubernetes secret that contains the AWS API keys. The manifest for this source will look similar to this:
ith this configuration, all SQS messages will be consumed and sent to Kafka using the CloudEvents specification. This example shows that with just two Kubernetes objects, it is possible to declaratively define a Kafka source.
Configuring a Kafka sink
When defining a Kafka sink, it is important to recall that a sink from the perspective of the Kafka cluster is a source from the perspective of the Kubernetes cluster. In other words, to define a Kafka sink, we instead define a KafkaSource object in our Kubernetes cluster.
This example manifest specifies a bootstrap server as a Kafka cluster in the Confluent cloud. It also specifies a topic as a source for messages to consume, and a sink—the end target of the Kafka messages. Note that this manifest also points to a Kubernetes service called display
As this example shows, all we need for a Kafka sink definition is a single Kubernetes manifest and a target microservice exposed as a traditional Kubernetes service.
Section 4: Simplifying Kafka connectors configuration with an integration language
All of the previous examples have relied heavily on YAML, a markup language that is intended to be human readable, but in practice can be bulky, verbose, and confusing. Everything in the Kubernetes ecosystem seems to be configured with YAML, and DevOps ends up writing and maintaining a lot of it. In the Kubernetes community, the proliferation of these configuration files has been referred to as a "face full of YAML."
TriggerMesh recently open sourced TriggerMesh Integration Language (TIL), an authoring tool for integrating various applications with event-driven mechanisms at its core. This tool has two goals:
- Simplifying the design of event-driven systems
- Generate complex YAML based on the more friendly and compact HashiCorp Configuration Language (HCL)
TIL declaratively represents the design of an integration, abstracting the API objects being used to implement enterprise integration patterns.
Let us examine how this new configuration language can represent a Kafka and Kubernetes flow similar to the examples used earlier.
TriggerMesh Integration Language
If we approach a Kafka flow as an integration, we can represent that flow with TriggerMesh Integration Language. This example shows that TIL vastly simplifies equivalent configurations written in YAML.
In an earlier section, we saw YAML configuration for a KafkaSink object that looked like this:
The TIL syntax shortens this significantly. It eliminates a number of extraneous configuration elements, such as API version. Since all we need to know is that this target endpoint is a Kafka endpoint, our configuration becomes:
GitHub Kafka source and microservice as a Kafka sink
Equipped with TIL, we can declare a message flow declaratively and manage it in Kubernetes. For example, a GitHub event source sending events to a Kafka stream and consuming from a Kafka stream to target a serverless workload on Kubernetes would be declared like this:
This configuration file can quickly be used to generate a Kubernetes-ready configuration on the command line using the til command like this:
Marrying the two biggest trends in enterprise IT—Kafka and Kubernetes—allows teams to fluidly connect all their data with all their apps. However, smoothly connecting these two technologies requires a new approach, one facilitated by TriggerMesh.
In this paper, we showed you how TriggerMesh simplifies writing enterprise integrations by adopting a cloud-native declaration. We then demonstrated an innovative solution to the "face full of YAML" problem by describing enterprise integration with the TriggerMesh Integration Language.