Yellow.ai and Signadot: Pioneering Parallel Feature Development for Faster Releases

Yellow.ai and Signadot: Pioneering Parallel Feature Development for Faster Releases

When you’ve got over 100 developers and a single testing environment, you’re bound to run into conflicts. At Yellow.ai, conflicts and failed deployments to QA were adding significantly to the time to ship code. With Signadot’s model of request isolation, developers could do highly accurate tests and exploration on a shared cluster without impacting each other’s work. The result was a 2-3x improvement in developer productivity.

Yellow.ai is a conversational AI platform, enabling enterprises to unlock business potential at scale. The platform is trusted across 85+ countries by 1000+ enterprises, including Domino’s, Sephora, Hyundai, MG Motors, and Biogen International. This is the story of how they came to adopt Signadot.


The problem: many developers sharing Staging

In conversation with Lokeshwaran, a senior Platform Engineer at Yellow, we learned about the state of staging and integration testing at Yellow.ai before Signadot:

“We had a single staging environment which over 100 devs used to push code for testing, a lot of unstable code would get pushed, and it created a lot of friction while testing new features. It was very time consuming and annoying!” Lokeshwaran · Senior Platform Engineer, Yellow.ai

One thing we don’t discuss enough is how failures to push to Staging add stress to the developer, especially when that environment is shared. Picture our product engineer: she’s just put in the time to develop a feature, write unit tests, and make sure it meets whatever contract testing is implemented. She submits her PR. Then a while later, after starting a new task, she gets a report that this branch broke staging. She may or may not have any indicator of how Staging is broken. With less information, out-of-band work, and a time crunch, fixing a simple bug becomes a lot more stressful.

How did this affect Yellow.ai’s business? By hurting overall velocity: “These conflicts and failed merges meant the time taken to push the code from QA to production was significantly longer than expected, leading to much less developer productivity and an increase in overall feature development time.” — Lokeshwaran, Senior Platform Engineer, Yellow.ai


One attempted fix: per-developer clusters

While the problem of conflicts over Staging is nearly universal in large microservice architectures, the possible solutions are quite varied. One that Lokeshwaran tried was creating new virtual clusters for each developer: “We were trying out having virtual clusters per developer, but maintaining services and database schemas in sync was the biggest challenge.”

This points out the biggest problem with a per-developer cluster solution at large scale: keeping all those clusters updated correctly. Another concern is cost: “We were looking at building a virtual cluster using vcluster per developer, but it ended up costing quite a lot.”

If you’re only running these clusters when needed then your developers will be waiting for them to start up each day at least. If they’re running continuously, then infrastructure costs become an issue, especially galling since 99% of the time these clusters are completely idle!


Choosing Signadot for a shared cluster with developer isolation

Signadot’s request isolation means developers can work on branches with modified versions of one or more services, all the while still sending requests to services in a shared cluster. Other developers can use and test on the same cluster without impacting each other. This lets developers test on a shared cluster earlier, an effective tool for ‘shifting left’ with integration testing.

“[Signadot is] cost effective because of time-to-life (TTL) of sandboxes and we can deploy only the application that is required to be tested as a new deployment, while all the other microservices need not be affected.” Lokeshwaran · Senior Platform Engineer, Yellow.ai

Signadot makes it easy to get started

As Lokeshwaran explained, “We found Signadot promising because it is very simple to use and adopt. Setup and implementation was quite straightforward. We had to install the Signadot operator, and for the requests to be routed to a sandbox, we would need to add an annotation to that deployment, which would in turn add a sidecar to route the request to the appropriate service.”

In order for request-based isolation to work, there needs to be a system in place for context propagation. For most users, OpenTelemetry is the best project for adding context propagation, and Lokeshwaran at Yellow.ai was no exception: “Context propagation was a prerequisite for Signadot to work, but this was no problem since we had adopted OpenTelemetry recently. Signadot worked very well for us without any hassle for all the HTTP use cases which covers more than 90% of the platform currently.”


Non-HTTP requests, and message isolation with Kafka and RabbitMQ queues

There are specific requirements if your Test/Staging cluster includes a queue, to ensure that context propagation works properly for events passed through a queue like Kafka or RabbitMQ. Only slightly more work was required to add support for requests that weren’t HTTP or gRPC:

As Lokeshwaran described, “Only HTTP and gRPC requests are supported since context propagation is supported from these protocols only. We had to write monkey patches of client libraries for Kafka and RabbitMQ to push the routing key in the message header and some logic from the consumer side to route it into sandbox/baseline.”


Results: for Yellow.AI, a 2-3X in developer velocity

2–3x improvement in developer productivity. Sandboxes can’t be polluted by multiple people, testing is seamless, and developers report a significant “wow factor” when they first experience isolated testing on a shared cluster.

Adoption for the Signadot process, creating sandboxes in a shared cluster, has been a success at Yellow.ai:

“Signadot has definitely improved developer productivity by 2-3x, since sandboxes can’t be polluted by multiple people, and testing is seamless!” Lokeshwaran · Senior Platform Engineer, Yellow.ai

The use of local sandboxes, where developers can host the forked versions of services locally, is seeing good use as well. As Lokeshwaran puts it: “Local sandboxes are a great feature, it improves local testing without devs having to run other services/api-gateways in their local.”

If you’d like to learn how Signadot can help you, or you want to hear others’ experiences, join our Slack community to find out more.

Stay in the loop

Get the latest updates from Signadot

Validate code as fast as agents write it.