This is a guest post by Isaac Dominguez, SRE at ManoMano. Many thanks to Isaac and the whole ManoMano tech team.
ManoMano is the largest marketplace for products and services in the DIY, gardening and Home Improvement sector in Europe. And it is thanks to the tech behind our digital platform that we can offer you the right products, services and advice every step of the way. Which is why we invest in the best technologies every day. This is also why we decided to build every brick in-house and make ManoMano a European tech leader.
Increasing productivity for test engineers
The System Reliability Engineering (SRE) function at ManoMano is core to the way we are able to innovate at a pace that lets the company sustain its current growth trajectory.
One example that we'll discuss in this post is a development team at ManoMano that is focused on benchmarking and performance testing the ManoMano stack, which includes the web and mobile experiences, as well as backend systems. The SRE team is tasked with providing those test developers with a serverless eventing experience that allows them to execute code when certain specific AWS events occur. For example, benchmarking tests write results to Amazon Simple Storage Service (AWS S3). Test developers want to process those result files with code written in languages like Node.js, as soon as they become available.
Minimizing lock-in and optimizing resource usage
ManoMano runs on AWS but we minimize vendor lock-in by choosing not to use FaaS solutions like Lambda. We also want a unified way in which to handle CI/CD workflows and their related security concerns. Using containers for all applications allows us to achieve this, as opposed to spreading workflows and security practices across different compute abstractions like containers, FaaS, and EC2.
Beyond the testing use case described above, ManoMano has hundreds of microservices that address a variety of business needs and they all run on AWS EKS. Many of these services are long-running and always up, but the question is do they really need to be? Does their usage justify continuous consumption of compute resources and the associated costs?
When we drilled down, we noticed that many of our long-running services only process a few requests per day. Others spend most of their time idling while waiting for an event to occur on another system. We wanted to change things around so that services run only when needed.
An event-driven platform that checks all the boxes
The SRE team is creating an event-driven architecture that gathers events from across AWS services, ingests them into a centralized broker and allows developers to subscribe to specific events and consume them from their code. Developers are not weighed down by infrastructure and security concerns, which are abstracted away by the platform. Knative scale-to-zero is an important part of the solution that allows workfloads to adapt their resource consumption depending on their load, including if that load is 0. This can provide significant savings if you factor in the number of teams that deploy workloads that don’t need to run 24/7.
ManoMano’s SRE team centralizes AWS S3 events into a common Simple Queue Service (SQS) queue in order to have a single point from which they can be ingested into a Kubernetes cluster (EKS). The TriggerMesh SQS source is being used as an event source that pushes SQS events into a series of Knative Brokers that are routed and organized by tenant (Application Domains).
A Helm chart template is provided to developers such that they can customize a few key parameters for Knative Serving and Eventing Triggers, such as specifying the desired event types they want to consume, and they can also provide their function code. Currently AWS S3 Events are the only ones supported as part of this initial use case.
The CI/CD platform handles the rest:
With this new event-driven architecture, we no longer need to have idle long running tasks that process occasional AWS S3 objects. Instead, containers are scheduled on demand to run on EKS in reaction to AWS S3 events.
Less CO2, and a better developer experience
At ManoMano, we strive to keep our IT expenses and CO2 footprint to a minimum.
The combination of Knative and TriggerMesh is scratching ManoMano’s itch to reduce the number of idling containers that waste valuable compute resources, while also providing developers with an easy way to deploy their code. Developers can now build serverless event-driven applications that only run when needed, in reaction to events that are captured and routed in real time.