Multicloud object storage notifications

Jonathan Michaux

Jonathan Michaux

Apr 12, 2023
Multicloud object storage notifications
Multicloud object storage notifications

Multicloud allows businesses to avoid vendor lock-in, provides the flexibility to choose different cloud providers for different workloads, and can allow for more resilient disaster recovery implementations. However, organizations also want portability so that workloads or applications can be moved between different cloud environments, and so that applications aren’t strongly coupled to a specific cloud platform vendor. It might not be viable nor desirable to achieve 100% portability, but certain scenarios are emerging as being good candidates for multicloud portability. In this post I’ll discuss one of them: object storage notifications.

I'm going to dive into an example of how to use TriggerMesh to easily create a cloud-agnostic object storage notification layer that provides a standardized event every time an object is created on either AWS S3 or Google Cloud Storage. The original events from each provider have very different structures and metadata, so I'll transform them to match a simpler event schema by extracting the object name, creation date, bucket name, and object size. Event-driven application developers can then consume these events more easily, regardless of their origin. This results in more portable applications, and shifts complexity from the app developers to the eventing platform.

The post is structured as follows:

  1. create the AWS S3 source
  2. create the Google Cloud Storage source
  3. transform events from both sources to create a unified event stream
  4. deliver unified events to an event display console

Ingest S3 object creation notifications

Make sure tmctl is installed, and then create a new broker: 

tmctl create broker triggermesh

Then you can create an AWS S3 source. The prerequisites here are:

  • an AWS IAM access key to authenticate with S3 (alternatively you can use an IAM role if you’re running TriggerMesh on EKS)
  • an AWS S3 bucket ARN

Once you have those, create the AWS S3 source with the following command: 

tmctl create source awss3 --arn <arn> \
                          --auth.credentials.accessKeyID <id> \
                          --auth.credentials.secretAccessKey <secret> \
                          --eventTypes "s3:ObjectCreated:*"

Notice that I'm specifying a filter so that I only get notified about object creations, but you can omit the filter if you want to be informed about all events on this bucket. 

Also note that under the hood the TriggerMesh S3 source is creating an SQS topic that is used to deliver S3 events from the bucket to TriggerMesh. 

Now run tmctl watch in a separate terminal so that you can see the events landing in the broker. Then add a new object to the S3 bucket. Below is an example of the event received by the broker, in which I’ve underlined the attributes I want to keep in the new unified schema.

☁️  cloudevents.Event
Context Attributes,
  specversion: 1.0
  type: com.amazon.s3.objectcreated
  source: arn:aws:s3:::jmcx-dev
  subject: MonaLisa-daVinci.jpeg
  id: 3ff80a6b-29e4-47fd-86b9-c0580cdb20e6
  time: 2023-04-03T20:17:29.570420299Z
  datacontenttype: application/json
Data,
  {
    "awsRegion": "us-east-1",
    "eventName": "ObjectCreated:Put",
    "eventSource": "aws:s3",
    "eventTime": "2023-04-03T20:17:28.513Z",
    "eventVersion": "2.1",
    "requestParameters": {
      "sourceIPAddress": "88.170.205.78"
    },
    "responseElements": {
      "x-amz-id-2": "VMlFKbaL0zz0L3fO6gg8GVfPxhw4h68qXXVntIB9rXblQ4X+epMSO9Bfee+TMkeVVlO0Mr7I6A9wqZf4wgs5ESzLMjqZda6H",
      "x-amz-request-id": "BTY94PVFE626AH9G"
    },
    "s3": {
      "bucket": {
        "arn": "arn:aws:s3:::jmcx-dev",
        "name": "jmcx-dev",
        "ownerIdentity": {
          "principalId": "A2MWJ7PRW7U41K"

		}
      },
      "configurationId": "io.triggermesh.awss3sources.local.triggermesh-awss3source",
      "object": {
        "eTag": "a8d4fb3a57d904303dbc6ee2ae11efc3",
        "key": "MonaLisa-daVinci.jpeg",
        "sequencer": "00642B345862164155",
        "size": 61790
      },
      "s3SchemaVersion": "1.0"
    },
    "userIdentity": {
      "principalId": "AWS:AIDAQUHRFMYW5IL6X72NI"
    }
  }

Ingest Google Cloud Storage object creation notifications

To use the Google Cloud Storage source, start by following the steps below to configure the necessary resources and permissions. This is a simplified but overly permissive approach, there are ways in which you can be more restrictive.

  • In IAM & Admin, and create a service account for TriggerMesh, give it the Pub/Sub Editor and Storage Admin  roles on the project
  • Create a key for this service account and save it in JSON format in a file called serviceaccountkey.json
  • Create a Google Cloud Storage bucket
  • Copy the address of Cloud Storage’s service account for this project from the bucket settings page, and open IAM to give it the Pub/Sub Publisher role on the project

Now that we’re set, we can create the TriggerMesh Google Cloud Storage source with the following command:

tmctl create source googlecloudstorage --bucket <bucket name> \
                                       --pubsub.project <project name> \
                                       --serviceAccountKey $(cat serviceaccountkey.json) \
                                       --eventTypes OBJECT_FINALIZE

Like for the AWS S3 source, we’re filtering for object creation events only, and this is optional. 

Under the hood, the TriggerMesh component will create and configure a Pub/Sub topic to be used to transport bucket notifications into TriggerMesh. 

Upload an object to the bucket and you should see the event land in the broker. In the example below I’m again underlining the fields we want to use in our new unified event schema:

☁️  cloudevents.Event
Context Attributes,
  specversion: 1.0
  type: com.google.cloud.storage.notification
  source: gs://jmcx-test
  id: 7282987719349767
  time: 2023-04-04T07:43:39.862Z
  datacontenttype: application/json
Extensions,
  pubsubmsgbucketid: jmcx-test
  pubsubmsgeventtime: 2023-04-04T07:43:39.787266Z
  pubsubmsgeventtype: OBJECT_FINALIZE
  pubsubmsgnotificationconfig: projects/_/buckets/jmcx-test/notificationConfigs/1
  pubsubmsgobjectgeneration: 1680594219774907
  pubsubmsgobjectid: StarryNight-vanGogh.jpeg
  pubsubmsgpayloadformat: JSON_API_V1
Data,
  {
    "Data": {
      "kind": "storage#object",
      "id": "jmcx-test/StarryNight-vanGogh.jpeg/1680594219774907",
      "selfLink": "https://www.googleapis.com/storage/v1/b/jmcx-test/o/StarryNight-vanGogh.jpeg",
      "name": "StarryNight-vanGogh.jpeg",
      "bucket": "jmcx-test",
      "generation": "1680594219774907",
      "metageneration": "1",
      "contentType": "image/jpeg",
      "timeCreated": "2023-04-04T07:43:39.787Z",
      "updated": "2023-04-04T07:43:39.787Z",
      "storageClass": "STANDARD",
      "timeStorageClassUpdated": "2023-04-04T07:43:39.787Z",
      "size": "61790",
      "md5Hash": "qNT7OlfZBDA9vG7irhHvww==",
      "mediaLink": "https://storage.googleapis.com/download/storage/v1/b/jmcx-test/o/StarryNight-vanGogh.jpeg?generation=1680594219774907\u0026alt=media",
      "crc32c": "/b6VBw==",
      "etag": "CLvn7Kvdj/4CEAE="
    },
    "ID": "7282987719349767",
    "Attributes": {
      "bucketId": "jmcx-test",
      "eventTime": "2023-04-04T07:43:39.787266Z",
      "eventType": "OBJECT_FINALIZE",
      "notificationConfig": "projects/_/buckets/jmcx-test/notificationConfigs/1",
      "objectGeneration": "1680594219774907",
      "objectId": "StarryNight-vanGogh.jpeg",
      "payloadFormat": "JSON_API_V1"
    },
    "PublishTime": "2023-04-04T07:43:39.862Z",
    "DeliveryAttempt": null,
    "OrderingKey": ""
  }

Create a unified event stream

Now I'm going to think about what I want my unified notification to look like. In the data attribute (event payload), we’ll store the object name, creation date, size, and bucket name. In terms of context attributes (metadata headers) we’ll create a new event type called io.triggermesh.objectStorage.objectCreated that will be common to all cloud providers, we’ll mention the original source of the event in the source attribute, and I’ll repeat the object name in the subject attribute. Below is an example of what I'm expecting to produce: 

☁️  cloudevents.Event
Context Attributes,
  specversion: 1.0
  type: io.triggermesh.objectStorage.objectCreated
  source: googlecloudstorage:jmcx-test
  subject: MonaLisa.jpeg
  id: 7282987719349767
  time: 2023-04-04T07:43:39.862Z
  datacontenttype: application/json
Data,
  {
     "objectName": "MonaLisa.jpeg",
     "bucketName": "jmcx-test",
     "timeCreated": "2023-04-04T07:43:39.787Z",
     "size": "61790"
  }

To transform the original events into this new format, we’ll create one TriggerMesh JSON transformation for each event source. 

AWS S3 event transformation

I’m going to transform the original AWS S3 events to match the new unified schema by using the following TriggerMesh JSON transformation that I'm saving in a file called s3-transform.yaml.

I'm not going into the details of the JSON transformation syntax here (please check out the docs for that), but I will say is that it's an easy, low-code, declarative way to add / remove / move attributes in a JSON object, and has just the level of expressivity needed to solve the problem at hand. For more complex transformations and processing you can use functions

context:
- operation: add
  paths:
  - key: type
    value: io.triggermesh.objectStorage.objectCreated
  - key: source
    value: awsS3:$bucketName
  - key: subject
    value: $objectName
data:
- operation: store
  paths:
  - key: $objectName
    value: s3.object.key
  - key: $bucketName
    value: s3.bucket.name
  - key: $timeCreated
    value: eventTime
  - key: $size
    value: s3.object.size
- operation: delete
  paths:
  - key:
- operation: add
  paths:
  - key: objectName
    value: $objectName
  - key: bucketName
    value: $bucketName
  - key: timeCreated
    value: $timeCreated
  - key: size
    value: $size

To create the transformation component, run the following command that references the transformation specification from the file: 

tmctl create transformation --name s3-transform \
                            -f s3-transform.yaml \
                            --source triggermesh-awss3source \

Now if I create a new object in the S3 bucket, I can see the new unified event appear in the watch output!

☁️  cloudevents.Event
Context Attributes,
  specversion: 1.0
  type: io.triggermesh.objectStorage.objectCreated
  source: awss3:jmcx-dev
  subject: MonaLisa.jpeg
  id: 0f4b1488-db38-4226-b8cd-7e32d7145efa
  time: 2023-04-04T08:59:05.612830967Z
  datacontenttype: application/json
Data,
  {
    "bucketName": "jmcx-dev",
    "objectName": "MonaLisa.jpeg",
    "size": 853892,
    "timeCreated": "2023-04-04T08:59:04.204Z"
  }

Google Cloud Storage event transformation 

Let’s map the Google Cloud Storage event type to this new standard type by using the following TriggerMesh JSON transformation that I'm saving in a file called gcs-transform.yaml:

context:
- operation: add
  paths:
  - key: type
    value: io.triggermesh.objectStorage.objectCreated
  - key: source
    value: googlecloudstorage:$bucketName
  - key: subject
    value: $objectName
- operation: delete
  paths:
  - key: Extensions
data:
- operation: store
  paths:
  - key: $objectName
    value: Data.name
  - key: $bucketName
    value: Data.bucket
  - key: $timeCreated
    value: Data.timeCreated
  - key: $size
    value: Data.size
- operation: delete
  paths:
  - key:
- operation: add
  paths:
  - key: objectName
    value: $objectName
  - key: bucketName
    value: $bucketName
  - key: timeCreated
    value: $timeCreated
  - key: size
    value: $size

To create the transformation component, run the following command that references the transformation specification above from the file: 

tmctl create transformation --name gcs-transform \
                            -f gcs-transform.yaml \
                            --source triggermesh-googlecloudstoragesource

Now if I create a new object in the GCS bucket, I can again see the new unified event appear in the watch output, except this time the source attribute refers to GCS instead of S3:

☁️  cloudevents.Event
Context Attributes,
  specversion: 1.0
  type: io.triggermesh.objectStorage.objectCreated
  source: googlecloudstorage:jmcx-test
  subject: MonaLisa.jpeg
  id: 7283388627018347
  time: 2023-04-04T08:52:01.487Z
  datacontenttype: application/json
Data,
  {
    "bucketName": "jmcx-test",
    "objectName": "MonaLisa.jpeg",
    "size": "853892",
    "timeCreated": "2023-04-04T08:52:01.448Z"
  }

Route unified events to an event display console

Now developers can implement event-driven applications that react to object storage notifications regardless of their origin. This might be to trigger a function, a service, or simply route the event to a destination like Kafka or Splunk.

As an example, I'm now going to easily route all these events to a console to display them in a uniform manner. We’ll use the simple TriggerMesh console for this, by creating a target from it, from an image, using tmctl like so: 

tmctl create target --from-image gcr.io/triggermesh/triggermesh-console:v0.0.1

I’ll then create a trigger that only routes the new event type to this console:

tmctl create trigger --target triggermesh-target-service \
                     --eventTypes io.triggermesh.objectStorage.objectCreated 

Run tmctl describe to pick up the URL of the TriggerMesh console (called triggermesh-target-service by default, if you used the exact commads in this post) and open it in your browser. Create new objects in S3 and GCS, and witness the beauty your new unified event stream 🙂.

Known issue

We’ll soon release a fix for a known issue which prevents us from using the JSON Transformation component to delete unwanted context attribute extensions, like those that might appear in events from the Google Cloud Storage source.

Conclusion and next steps

By creating a cloud-agnostic eventing layer like this, you can abstract away a lot of the complexity of interacting with cloud services, shifting the complexity from developers to the platform, and increasing the portability of the event-driven applications.

In this post, we showed how to do this with object store services AWS S3 and Google Cloud Storage, but you could take a similar approach with other event sources. In the post we routed these events to an event display console, but you could also trigger serverless functions, knative services, or route the events to a Kafka topic, among many other things. 

We used the tmctl CLI here, but you can export your configuration as a Kubernetes manifest by doing tmctl dump if you want to run this on a K8s cluster with TriggerMesh installed.

If you want to give this a try yourself, you can start by following the instructions in this post, or head to our quickstart guide. Please join us and reach out on Slack if you need help, our team is very responsive and loves to hear from users. 

Create your first event flow in under 5 minutes