Apps that kill themselves when they become obsolete

Oct 16, 2023 min read

When we talk about putting a new application version into production, many teams still have a slew of manual actions.

With a good old production deployment plan, a pilot manages the version upgrade, intervening to push the new bricks one by one. This can take an hour, a day, or more, with duration varying significantly in case of problems. We generally want to update the entire platform, whether we have 3 or 180 applications to deploy.

The error.

What an error. What an error to choose to consider all our applications as a single block. We often do this to give a famous version number to this platform. We redeploy what hasn’t changed, generating unnecessary error risks, all that to update a number. We create monstrous work for nothing.

Today there’s the micro-service concept. Everyone, or almost, has assimilated it. But why the hell deploy all these micro-services together? Often because of bad design. They’re interdependent. We’ve gone from an old monolithic design to a distributed monolith design. Great deal.

I need to update an application, where do I start?

So how can we simplify all this? By making our micro-services independent. They must have their own update cycle. This obviously doesn’t prevent synchronizing the production deployment of several components participating together in a more global feature. We can also deploy dormant features, deployed but not activated, and which will be made effective once all bricks are ready.

Simplifying is a time gain in the long term.

By simplifying, we reduce risks. We reduce the risk of introducing regressions on components that theoretically haven’t undergone change. We also forget this global version that obsesses many release managers. We’ll prefer versions per micro-service, which will only evolve when it’s perfectly necessary.

Once this is done, production deployments become a non-event. No particular ceremony, few people involved. It goes fast. It’s efficient. In case of problems, rollback is very simple. Compatibilities with the rest of the platform and the new version are thought out in advance and the platform in its entirety remains homogeneous as its bricks evolve.

It’s starting to get close, isn’t it?

Why wait now to put into production? No more need to stack a certain number of features to launch them all together in the deep end. If our tests are reliable and optimized, we can replace a large production deployment of a component with many smaller ones, each with a lower error risk. We further reduce the complexity of a rollback and the risk of the update.

This is where we get to what brings you here.

Monitor your applications well

I make a small aside on monitoring. All pre-production and production platforms must be perfectly monitored. Continuous flows of metrics must feed databases that will serve us to determine if our components are behaving correctly and if our updates haven’t increased error rates or degraded performance. It’s a prerequisite to what follows.

Application monitoring is essential to automate their lifecycle

I resume, then.

Automation, the real one

Since our production deployments are done on the fly, that our components update thanks to our integration chain, with reliable tests that fully assure us that our components are operational, and that our production is followed by metrics, is it necessary to have someone to press the button?

Well no. That’s the whole point of this simplification. The integration chain will automatically trigger production deployment. In other words, the last human action will have been the developer’s push that will update their code on the server. Their modification will go through several test stages, with even potentially passages on large-scale test platforms, then if all lights are green, production deployment. The eventual rollback in case of problems will itself be automatic, with warning to the developer so they fix their code before resubmitting it.

So why would the application kill itself? Simply that in environments like today’s with kubernetes, we end up with an often enormous number of instances of the same application, often scattered on different servers, different datacenters, even different continents. Kubernetes (and others) have as their only instruction to maintain the number of these instances at the desired number. And so if some kill themselves by detecting they’re no longer in the right version, the system will immediately exchange them for an up-to-date version…

kubernetes is mainly there to ensure the number of application instances at each moment

We’ll obviously need to ensure that all instances don’t choose to kill themselves at the same time. A token system, whose number will be the maximum number of instances we can renew at the same time, will be put in place. Each instance will reserve a token to terminate, and new ones will return it. A simple Key/value store like redis or consul can be the guarantors of these tokens.

Personal point of view on the method

I always prioritize the autonomization of application instances. Leaving them the initiative of their changes allows having global intelligence without human action. This also allows rapid and simplified scaling. Some will say they lose control over what runs in production. That’s false. An automation put in place with the cooperation of all stakeholders will always be more efficient than cyclically doing a large number of manual actions.

Everything described in this article may seem utopian to some, considering their daily life. But there’s nothing magical. This level is reachable in most cases. Automating allows improving iteratively. At each new error its correction, and therefore the assurance that it won’t come back. Each dependency removed between two applications, that’s one less problem for the next deployments.

Most of the time, those who say it’s impossible give themselves excuses and hide behind their little finger….

Simplifying allows automating.

Automation brings quality.

Quality frees time to produce features.

-|