Configuring Kafka Sources and Sinks in Kubernetes

Sebastien Goasguen
Sebastien Goasguen
Aug 23, 2021
Configuring Kafka Sources and Sinks in Kubernetes
Configuring Kafka Sources and Sinks in Kubernetes

Blog 3 in our four-part Series: Using TriggerMesh to Connect Apache Kafka And Kubernetes

In blog 1 in this series, we looked at how TriggerMesh lets you declaratively define your Kafka messages and send them in the CloudEvent format to a Kubernetes workload. 

 Blog 2 did the reverse, and explored how to use TriggerMesh to send messages to a Kafka topic declaratively from a Kubernetes workload. 

 This blog will show you how to define your Kafka Sources and Sinks declaratively and manage them in your Kubernetes clusters. This means you can avoid updating and compiling a lot of Java and uploading JARs.

 Streaming data into a Kafka cluster and from a Kafka cluster to somewhere else is usually done with a Kafka connect cluster and associated configurations.

 This is usually a real pain point for Kafka users and produces a significant drag on velocity. 

 It involves:

  • Deploying and running a Kafka connect cluster
  • Updating and compiling connectors in Java
  • Uploading JARs to specific directories in your Kafka connect cluster
Kafka connect data pipeline from confluent blog

While there is a nice collection of Kafka connectors, few on Confluent cloud are fully managed, which leaves Kafka users with a lot of work to do if they want to manage their sources and sinks efficiently.

 Following up on the previous two blogs where we reviewed how to produce messages to Kafka and how to consume messages from Kafka, I am now going to show you how you can define your Kafka Sources and Sinks declaratively and manage them in your Kubernetes clusters.

Configuring a Kafka Source

From the point of view of a Kafka cluster, a Kafka source is something that produces an event into a Kafka topic. With the help of Knative we can configure a Kafka source using an object called a KafkaSink this is super confusing I know and I am really sorry about it :) everything is relative and depends on where you stand.

 To create an addressable endpoint in your Kubernetes cluster that will become the source of messages into your Kafka cluster you create an object like that one below:

apiVersion: eventing.knative.dev/v1alpha1
kind: KafkaSink
metadata:
name: my-kafka-topic
spec:
auth:
  secret:
    ref:
      name: kafkahackathon
bootstrapServers:
— pkc-456q9.us-east4.gcp.confluent.cloud:9092
topic: hackathon

Note the bootstrap server on Confluent cloud, and the specified topic. Equipped with this you can now create a AWS SQS source that will send messages directly into Kafka with a manifest like the one below:

apiVersion: sources.triggermesh.io/v1alpha1
kind: AWSSQSSource
metadata:
 name: samplequeue
spec:
 arn: arn:aws:sqs:us-west-2:1234567890:triggermeshqueue
 credentials:
   accessKeyID:
     valueFromSecret:
       name: awscreds
       key: aws_access_key_id
   secretAccessKey:
     valueFromSecret:
       name: awscreds
       key: aws_secret_access_key
 sink:
   ref:
     apiVersion: eventing.knative.dev/v1alpha1
     kind: KafkaSink
     name: my-kafka-topic

Note that the AWS SQS is represented by its ARN and that you can access it thanks to a Kubernetes secret which contains AWS API keys. The sink section points to the KafkaSink previously created. With this manifest all SQS messages will be consumed and sent to Kafka using the Cloudevents specification.

 Two Kubernetes objects and you have a declarative definition of your Kafka source

Configuring a Kafka Sink

For your Kafka Sink, we sadly fall back to the same potential confusion: what is a Sink from the Kafka cluster perspective is a source outside of that cluster. So to define a Kafka sink, you define a KafkaSource in your Kubernetes cluster.

 For example the manifest below specifies your bootstrap server pointing to a Kafka cluster in the Confluent cloud, a specific topic from which to consume messages from and a sink which is the end target of your Kafka messages. Here we point to a Kubernetes service called display :

apiVersion: sources.knative.dev/v1beta1
kind: KafkaSource
metadata:
name: my-kafka
spec:
bootstrapServers:
— pkc-453q9.us-east4.gcp.confluent.cloud:9092
net:
  sasl:
    enable: true
…
sink:
  ref:
    apiVersion: v1
    kind: Service
    name: display
topics:
— hackathon

So all you need for your Kafka Sink definition is a single Kubernetes manifest and a target micro-service exposed as a traditional Kubernetes service. Note that it could also be a Knative service in which case your target would benefit from auto-scaling including scale to zero.

Instead of a complex work flow of compiling JARs and uploading them to a Kafka connect cluster, you can leverage your Kubernetes cluster and define your Kafka sources and sinks with Kubernetes objects.

This brings a declarative mindset to your Kafka sources and sinks, gives you a source of truth and unifies the management of your Kafka flows and your micro-services. In addition this simplifies targeting micro-services running in Kubernetes when Kafka messages are present and with the addition of Knative gives you the autoscaling necessary to handle changing stream volume.

If you need a collection of Kafka sources, suddenly all Knative event sources can be used (e.g GitHub, GitLab, JIRA, Zendesk, Slack, Kinesis…)



Related Posts

Join our newsletter

Subscribe to TriggerMesh newsletter. You'll get only the most important news about TriggerMesh