Multicloud object storage notifications
Multicloud allows businesses to avoid vendor lock-in, provides the flexibility to choose different cloud providers for different workloads, and can allow for more resilient disaster recovery implementations. However, organizations also want portability so that workloads or applications can be moved between different cloud environments, and so that applications aren’t strongly coupled to a specific cloud platform vendor. It might not be viable nor desirable to achieve 100% portability, but certain scenarios are emerging as being good candidates for multicloud portability. In this post I’ll discuss one of them: object storage notifications.
I'm going to dive into an example of how to use TriggerMesh to easily create a cloud-agnostic object storage notification layer that provides a standardized event every time an object is created on either AWS S3 or Google Cloud Storage. The original events from each provider have very different structures and metadata, so I'll transform them to match a simpler event schema by extracting the object name, creation date, bucket name, and object size. Event-driven application developers can then consume these events more easily, regardless of their origin. This results in more portable applications, and shifts complexity from the app developers to the eventing platform.
The post is structured as follows:
- create the AWS S3 source
- create the Google Cloud Storage source
- transform events from both sources to create a unified event stream
- deliver unified events to an event display console
Ingest S3 object creation notifications
Make sure tmctl is installed, and then create a new broker:
Then you can create an AWS S3 source. The prerequisites here are:
- an AWS IAM access key to authenticate with S3 (alternatively you can use an IAM role if you’re running TriggerMesh on EKS)
- an AWS S3 bucket ARN
Once you have those, create the AWS S3 source with the following command:
Notice that I'm specifying a filter so that I only get notified about object creations, but you can omit the filter if you want to be informed about all events on this bucket.
Also note that under the hood the TriggerMesh S3 source is creating an SQS topic that is used to deliver S3 events from the bucket to TriggerMesh.
Now run tmctl watch in a separate terminal so that you can see the events landing in the broker. Then add a new object to the S3 bucket. Below is an example of the event received by the broker, in which I’ve underlined the attributes I want to keep in the new unified schema.
Ingest Google Cloud Storage object creation notifications
To use the Google Cloud Storage source, start by following the steps below to configure the necessary resources and permissions. This is a simplified but overly permissive approach, there are ways in which you can be more restrictive.
- In IAM & Admin, and create a service account for TriggerMesh, give it the Pub/Sub Editor and Storage Admin roles on the project
- Create a key for this service account and save it in JSON format in a file called serviceaccountkey.json
- Create a Google Cloud Storage bucket
- Copy the address of Cloud Storage’s service account for this project from the bucket settings page, and open IAM to give it the Pub/Sub Publisher role on the project
Now that we’re set, we can create the TriggerMesh Google Cloud Storage source with the following command:
Like for the AWS S3 source, we’re filtering for object creation events only, and this is optional.
Under the hood, the TriggerMesh component will create and configure a Pub/Sub topic to be used to transport bucket notifications into TriggerMesh.
Upload an object to the bucket and you should see the event land in the broker. In the example below I’m again underlining the fields we want to use in our new unified event schema:
Create a unified event stream
Now I'm going to think about what I want my unified notification to look like. In the data attribute (event payload), we’ll store the object name, creation date, size, and bucket name. In terms of context attributes (metadata headers) we’ll create a new event type called io.triggermesh.objectStorage.objectCreated that will be common to all cloud providers, we’ll mention the original source of the event in the source attribute, and I’ll repeat the object name in the subject attribute. Below is an example of what I'm expecting to produce:
To transform the original events into this new format, we’ll create one TriggerMesh JSON transformation for each event source.
AWS S3 event transformation
I’m going to transform the original AWS S3 events to match the new unified schema by using the following TriggerMesh JSON transformation that I'm saving in a file called s3-transform.yaml.
I'm not going into the details of the JSON transformation syntax here (please check out the docs for that), but I will say is that it's an easy, low-code, declarative way to add / remove / move attributes in a JSON object, and has just the level of expressivity needed to solve the problem at hand. For more complex transformations and processing you can use functions.
To create the transformation component, run the following command that references the transformation specification from the file:
Now if I create a new object in the S3 bucket, I can see the new unified event appear in the watch output!
Google Cloud Storage event transformation
Let’s map the Google Cloud Storage event type to this new standard type by using the following TriggerMesh JSON transformation that I'm saving in a file called gcs-transform.yaml:
To create the transformation component, run the following command that references the transformation specification above from the file:
Now if I create a new object in the GCS bucket, I can again see the new unified event appear in the watch output, except this time the source attribute refers to GCS instead of S3:
Route unified events to an event display console
Now developers can implement event-driven applications that react to object storage notifications regardless of their origin. This might be to trigger a function, a service, or simply route the event to a destination like Kafka or Splunk.
As an example, I'm now going to easily route all these events to a console to display them in a uniform manner. We’ll use the simple TriggerMesh console for this, by creating a target from it, from an image, using tmctl like so:
I’ll then create a trigger that only routes the new event type to this console:
Run tmctl describe to pick up the URL of the TriggerMesh console (called triggermesh-target-service by default, if you used the exact commads in this post) and open it in your browser. Create new objects in S3 and GCS, and witness the beauty your new unified event stream 🙂.
We’ll soon release a fix for a known issue which prevents us from using the JSON Transformation component to delete unwanted context attribute extensions, like those that might appear in events from the Google Cloud Storage source.
Conclusion and next steps
By creating a cloud-agnostic eventing layer like this, you can abstract away a lot of the complexity of interacting with cloud services, shifting the complexity from developers to the platform, and increasing the portability of the event-driven applications.
In this post, we showed how to do this with object store services AWS S3 and Google Cloud Storage, but you could take a similar approach with other event sources. In the post we routed these events to an event display console, but you could also trigger serverless functions, knative services, or route the events to a Kafka topic, among many other things.
We used the tmctl CLI here, but you can export your configuration as a Kubernetes manifest by doing tmctl dump if you want to run this on a K8s cluster with TriggerMesh installed.
If you want to give this a try yourself, you can start by following the instructions in this post, or head to our quickstart guide. Please join us and reach out on Slack if you need help, our team is very responsive and loves to hear from users.