A collection of lessons learned when moving from a centralized enterprise service bus (ESB) to more fine-grained, cloud native deployment sometimes known as agile integration.
Many enterprises perform integration using a large, centrally deployed and administered “enterprise service bus” (ESB), containing 100s, or even 1000s of integrations. This was a necessary pattern in the past due to the way that hardware and operating systems were provisioned and maintained.
However, modern integration runtimes have become much more lightweight and are optimized to run on container orchestration platforms such as Kubernetes. This brings an opportunity to deploy integrations using a more cloud native approach which, amongst many other things, encourages more fine-grained component design. Integrations can instead be deployed in small groups, or even individually.
This is sometimes referred to as agile integration as documented in 2018 when we were discussing the fate of the ESB pattern, although it really began several years earlier, whilst attempting to unpick the relationship between service oriented architecture and microservices in 2015. Agile integration knowingly follows in the footsteps of modern applications deployed as discrete “microservice” components, targeting greater development agility, deployment consistency, improved isolation, optimized resource usage and more.
This article collates some of the most important conclusions we’ve come to over the intervening years. We will explore various aspects of this approach, and their associated benefits.
Breaking up an enterprise service bus is easier than refactoring an application to microservices
Microservices really grew up in the application space as an approach to improve agility by moving away from large monoliths of code. Applications are typically a complex mass of interdependent pieces and breaking them up into microservice components can be very challenging. As a result, whilst many new applications are built using microservice principles, projects refactoring existing applications are less commonly successful and are very hard to scope.
Integrations, on the other hand, will typically have been built in response to reasonably discrete requirements, to connect one or more systems together, or to expose specific data over an API. As such, splitting up an enterprise service bus into individual integrations typically requires considerably less redesign and refactoring than breaking up a large application. It is important to point out that this doesn’t necessarily mean you should always go to the lowest level of granularity and deploy one integration per container: if sets of integrations are strongly associated to one another, it may be more efficient, and indeed less complex, to deploy them as a contained group.
Fine grained image-based deployment enables better runtime currency and ensures consistency during promotion though environments
In a traditional setup, we deploy to shared environments that are often running old versions of the integration runtime. This means that development environments also have to run those older versions, making it hard to take advantage of new features, bug fixes and — critically — security patches. This can be particularly problematic in the integration space where, for example, the ability to keep up with the latest protocols, data formats and security mechanisms is critical to productivity. Upgrading shared environments requires complex regression testing to ensure compatibility with all the currently deployed integrations. Product upgrades are therefore complex, risky, and as a consequence, rare. Worse still, environment configuration “creep” often occurs where changes are made to one environment but accidently not made to others. Inevitably, this results in sporadic, hard to diagnose issues occurring as you promote through environments.
Integration issues are hard enough to diagnose already given all the inherent external dependencies, so you want to minimize the chance of any self-introduced inconsistencies. In comparison, for a cloud native approach, we create container images which consist of a specific version of the integration runtime and any related configuration. Each integration can then be created and deployed independently using the latest runtime image, regardless of what other images are already out there. The contents of the image can also optionally include your integration code. This provides the additional assurance that the stack you test against is exactly what you then deploy into all higher environments.
Image based deployment has multiple benefits, including:
- greater deployment confidence due to the inherent consistency of self-contained immutable container images,
- the ability to use the most recent features in any new integration,
- the option to roll out updates to the runtime such as security fixes gradually rather than as a big bang,
- the ability to re-create in isolation the exact deployment of an integration for diagnostics purposes.
Integrations vary significantly in their availability and performance requirements
Each integration serves a different purpose, handling different loads, and is therefore likely to have different non-functional requirements around availability and performance. In a traditional monolithic deployment, we have to prepare an integration infrastructure for the shared enterprise service bus that is adequate for all possible integrations. This often translates to a load balanced, highly available pair that has twice the capacity of the highest expected workloads, just in case of failure.
In contrast, with fine grained deployments, we can consider each integration’s needs separately, configuring each on deployment, based only on its own requirements.
Let’s consider how different these requirements could be. We might have an asynchronously fed workload that could even tolerate server start up times, which could lay completely dormant until workload arrives. Another workload might require “5 nines” availability (up 99.999% of the time) and need to always be ready to take a high load. Whilst the first workload needs a relative low-spec’d server with no redundancy, for the second workload, for the second one we might always need 6 available replicas, spread across three availability zones, and have the ability to scale to 18 replicas when the workload increases. It simply doesn’t make sense to build a single one-size-fits all environment in advance for those two types of workloads. Fine-grained deployment enables us to create separate topologies for each component that precisely meets their needs. This results in better runtime isolation, more optimized use of infrastructure and licenses, and greater deployment confidence. It should be noted at this point that unique configurations for each and every integration may become overly complex. We typically observe that common patterns of non-functional needs — as policies for “Bronze, Silver, Gold” NFRs — can be created. By grouping together integrations with similar functional and non-functional requirements, the number of pre-defined deployment configurations can be drastically reduced, making it easier to manage multiple configuration permutations over time.
With fast start up times, and fine-grained deployment, you may not always need a “high availability pair”
Once you know the individual integration’s requirements, you can make more radical decisions about the infrastructure required. In traditional deployments, the standard approach to providing high availability (HA) has been to ensure you have at least two replicas of your server running. If either of them were to fail, the other could take on the full load immediately. This was largely because starting up servers took a significant amount of time, often taking several minutes for the operating system and the associated integration runtime to start and stabilize. However, standing up a minimum of two servers is costly, especially because we must ensure that each member of the HA pair runs with enough headroom to take on the whole load.
In a cloud native world, we deploy fine grained components optimized for rapid startup. Significant advances have been made with virtual machines over the years to enable radically faster start up time. Containers take almost no time to start, and modern lightweight integrations start in seconds. There are plenty of circumstances in integration where availability in the order of seconds is sufficient: integrations processing messages queues may not be associated with user interaction, so a few seconds may not matter, as long as they manage the overall throughput of messages over time. This means that we can potentially let go of the minimum of a pair of servers for a subset of our integrations. Of course, we would not be able to take advantage of these low infrastructure requirements, unless integrations are deployed independently on separate servers.
Selectively using serverless frameworks for sporadic workloads can significantly free up resources when doing fine grained deployment
Out of the box, container orchestration platforms like Kubernetes must have at least one replica running to be ready to take workload when it arrives. With functionality spread across many fine-grained components, this can result in a large minimum compute allocated even when it is not being used.
However, serverless frameworks such as Knative and KEDA are available for Kubernetes environments which only start up containers when there are messages on incoming queues. In some cases, this might even work for incoming HTTP requests. Your performance requirements for a given integration may be able to tolerate the relatively short container startup time for the first of each set of concurrent requests, you can bring the allocated compute to zero when there is no workload coming in. However, you must check that your integration runtime is designed such that it can be used in a serverless fashion, which amongst other things means it can run “stateless” and can be configured for rapid start up.
Mature integration environments have often built up quite a number of rarely used integrations that may be well suited to this serverless model.
How many “environments” do you actually need — maybe only one?
The traditional shared “one-size-fits-all” infrastructure often took months to provision. Given this lead time, any environments that were likely to be required had to be created in advance. As a minimum, you would expect three environments (e.g. development, test and production) in order to provide isolated performance testing, user acceptance testing, pre-production environments and more. In the integration space, we often need environments permanently configured to talk to specific versions of external systems. It is not unusual to see as many as ten permanent environments. Across these environments, the cost of infrastructure and licenses quickly multiplies. A lot of the time taken to build traditional infrastructure was related to the unique ways in which each runtime was installed, and how it was over-configured for non-functional capabilities such as availability and scaling. We need to completely re-think our approach if an “environment” could be created in minutes or even seconds.
When applications are created for cloud native deployment, they follow design principles around image-based deployment, minimizing of state, externalization of configuration and rapid start-up and more. These patterns enable us to provide requirements declaratively at deployment time, since platforms such as Kubernetes provide common approaches to deployment, load balancing and scaling. It is hard to understate how game changing this is. Imagine consistent, instantaneous creation of a topology for a specific integration, based on declarative (and therefore consistently available and repeatable) instructions that can be stored in source control alongside your integration code. It is viable — indeed advisable — to only create environments as you require them, and then delete them after using. If previously you had, for example, 7 environments from development to production — permanently provisioned with capacity for all your integrations — this new approach might bring those down to an equivalent of 1–2 environments, perhaps except for brief performance tests. Furthermore, due to the ability to scale to zero with serverless technology, an “environment” might have no footprint at all when there is no workload.
Cloud native really brings it question what we mean by environment in the first place. It might simply be better to refer to “deployments” with a specific purpose, differentiated by their access controls and connectivity. For example, based on its purpose, the deployment will only be accessible by specific user and administrators, will be constrained into a specific network boundary, and will only have access to a specific set of downstream systems. Environments by this definition become a more transitory entity, create for a purpose, and torn down immediately afterwards.
Consider whether test emulations could be more transitory too?
It’s not just the integrations themselves that take up resources in environments. Emulating realistic back end systems to test against has always been a major challenge in the integration space. An integration’s job is, by definition, to connect systems together. To create a set of realistic tests for an integration, you need a set of dependent systems available to integrate to. Cloud native deployment in containers doesn’t solve the challenges of scheduling time on back end systems, but it does make it more trivial to repeatably and consistently setup and tear down test stubs that emulate those back end systems. It is also simpler to synthetically introduce error conditions such as systems being unavailable or constraining the resources available to those stubs to test non-functional aspects of the integrations. Add on capabilities such as a service mesh (for example Istio) bring further testing options such as declaratively injecting faults, performing A/B testing and canary deployments.
Consider “fit for purpose” paths to production
The earlier discussion on environments leads us to more thoroughly explore a more fluid approach to our path to production. Rather than rigidly forcing every change through all environments, each fine-grained component can have its own “path to live”, choosing what is most appropriate to the functionality it delivers. Again, integrations can be particularly sensitive and varied in relation to business criticality of a function, or the sensitivity of the processed data, or perhaps availability requirements. Some requirements might for example benefit from additional environments, allowing more rigorous acceptance testing or greater depth of performance and availability testing. We will likely also consider different paths for deployment of different categories of change such as a major functional change directly affecting key users, compared to a minor runtime patch.
By enabling paths to production to be described on a per-component basis, we can find an optimum balance between time to production and resource usage for each group of integrations. This approach also enables us to improve compliance and agility, by ensuring all components precisely receive the right level of quality assurance, rather than enforcing a complex one-size-fits-all process that might be over-burdening in most cases.
To be fair, many enterprises already customize their path to production based on the magnitude of the change. However, they do still have to consider deployment onto a shared server as one of the tests. The move to fine grained integrations may at least remove some of the need for that step, since the integrations are inherently more decoupled.
It is also worth noting the increased importance of automation of the path to production. With an inevitably increased number of separately deployed components, it becomes more critical than ever to have a robust and repeatable way to deploy them through automated pipelines. These pipelines can then encode the different paths to production to ensure they follow a structured approach. The pipeline itself can of course be run on the container platform too, and with the introduction of Kubernetes-native pipeline technologies such as Tekton, this makes the creation of pipelines all the more native to the platform.
Integration runtimes must be able to adhere to a cloud native deployment approach
Cloud native encourages a move away from performing deployments on a pre-existing live server, to a declarative representation of the entire deployment.
The unit of deployment becomes for example, a container image. This is essentially a complete, definition of what you want to deploy. It includes files for the operating system, product/language runtimes, application/integration code, and also files containing any configuration details required to any of those layers. This means that before you even start a server, you have a top to bottom, immutable copy of exactly what will be deployed. Deployment itself then becomes a completely standardized procedure — copying that filesystem structure (e.g. as a container image) onto an infrastructure node, and starting it up. This image deployment mechanism is a fundamental and deeply mature part of the container platform and is the same across any container deployed on the (container) platform.
Furthermore, a change to any part, whether a new version of product, or a change to the code, results in a new file system representation (e.g. a new container image) which is deployed in exactly the same way. Copy the image to the node, start it up, and allow workload to move to the new container based on an appropriate upgrade policy.
Modern integration runtimes must therefore adhere to a few cloud native design principles, such as simple installation based on product binaries, configuration primarily via properties files and code deployment by placing the code on file system prior to startup.
If the integration runtime supports the above cloud native deployment approach then:
- Deployment pipelines no longer have a dependency on a running shared server of compatible configuration.
- The “source” includes everything. Not just code, but also product, OS, all relevant configuration, which means deployments for different environments are inherently identical.
- No changes are made to running environments. An initial deployment, and an update are handled identically, always starting from the creation of an immutable source image and configuration and associated deployment configuration and starting up fresh servers based on that.
A good way of summarizing the effect of all this is that many common testing issues “shift left”, because deploying to early environments so much more closely resembles the final production environment.
Zero trust means security hardening of individual components
A consequence of deploying fine-grained environments onto shared platforms such as Kubernetes is that they need to be inherently designed with security in mind. Since integrations often connect to other systems with privileged access it is particularly important that they themselves are not compromised. Just as the integration needs to have a declarative description of how it should be deployed topologically, it also needs to take ownership of how it minimizes its own security risks.
This is one of the key principles behind “zero trust” security. Everything takes appropriate responsibility for its own security measures so fine-grained components, need to comply with appropriate security standards. Examples include
- Reducing/removing the privileges with which processes in the container runs, in relation to the operating system it is running on, and to the Kubernetes platform itself. This for example might enable the container to be deployed under a “Restricted” security context constraint in Kubernetes.
- Restricting access to the container platform and only allowing deployment via pipelines from source control.
- Restricting inbound and outbound traffic to the container to only that it requires.
- Ensuring encryption on data in transit and at rest
- Utilizing appropriately secure mechanisms (e.g. Vaults) storing credentials and certificates
By moving to image-based deployments, it is possible to bake enterprise-wide as well as fine-grained security configurations directly into your images to ensure adherence to standards.
Covering this topic in any real depth is well beyond the scope of this post, but you can find a more detailed discussion on it here.
Enable more decentralized ownership through fine grained security and low code tooling
Everything we’ve discussed so far offers up a further opportunity, that is less about technology, and more about organization. If each integration is so cleanly self-contained we can more easily transfer ownership of that integration more easily.
Traditionally integrations have often been created and looked after by a centralized team, but in reality they often serve a particular part of the business. Why then couldn’t that sub-domain of the business actually own their integrations? They run in discrete containers, and can safely be deployed without affecting other teams. They have their own unique pipelines for build, testing and deployment. Through fine grained security on who can update the code and configuration in source control, we could easily enable the sub-domains within the business to own and maintain their own integrations. They could perform their own changes to integrations, decide when to deploy them, amend the performance and availability policies, and so on.
Clearly not all parts of the business will have the desire to take ownership of their integrations. In some cases that may be down to the skillset required. However, when adopting the cloud native approach to integration we have discussed, if a department has other applications running on a container platform, then running cloud native designed integrations on that same platform will look and feel very similar. This would at least enable those teams to have more visibility and operational control over the integrations that matter to them.
Explore whether the local departments have the skills to build and maintain their own integrations?
Integration tooling has in the past had a reputation for being particularly complex. There’s no getting away from the fact that designing and building integrations does have a significant level of specialism and subtlety. However, tools to build integrations have become simpler to use, whilst the number of protocols and data formats typically used by disparate systems has somewhat standardized, and the intelligence of the connectors to common applications has improved. This means that building integrations has actually become easier than before. Is that simplification enough that local departments could take on some of the building of integrations themselves too?
There will always be a set of complex and/or business critical integrations that require deep technical skills from a more centralized team, and built to higher level of quality and governance. It’s likely these will continue to be built, maintained, and even run by a central team — even if they are deployed as discrete fine-grained components. However, since these integrations are inevitably more expensive to create and run, they will only be able to satisfy integration requirements with a high cost/benefit ratio. Furthermore, as the number of applications requiring integration is constantly increasing, the central team has historically become a bottleneck.
A fine-grained deployment model enables a subset of integrations to be built independently by departments’ local IT teams. As a result, it may gradually become financially viable to implement more of the “long tail” of transitory and/or departmental integrations that are more dependent on business domain expertise than they are technical prowess. The reduced time spent in requirements analysis should bring down the costs, allowing a boarder ranging of integrations to be tackled. It’s also likely that these integrations had lower availability and performance requirements, which again plays into the hands of fine-grained deployment. There is now a good array of “low code” tooling available, that might enable less technical users, although the type of use cases that can be tackled this way need to be carefully restricted.
Potential benefits are high, but don’t underestimate the level of fundamental change to processes, technologies and skillsets
If done in a measured way, deployment using fine-grained components combined with other aspects of the cloud native approach enables the unlocking of many benefits. It allows us to think very differently about how we satisfy requirements around availability, performance, paths to production, security and more. In turn, this enables us to reduce our resource and software license footprint, and improve overall agility in relation to creation, deployment and operation of integrations.
However, no enterprises will be able to easily cope with a move from one-size-fits-all to a proliferation of unique approaches to deployment whilst simultaneously decentralizing down into local IT departments. This is true for the move to a cloud native approach in general, not just for integration. The resulting fine-grained landscape is going to have many more moving parts. Full automation of CICD pipelines that embed the necessary governance is going to be critical. It’s also worth noting that whilst the target is to use standardized skills to deploy and administer and operate all runtimes (containers/Kubernetes etc.), these skills are currently new to the market, and are unlikely to be present in your established workforce.
To take advantage of agile integration, it is therefore wise to start small, exploring with just a few integrations in the new platform, maturing pipelines and familiarizing with the cloud native approach, preferably in hand with the adoption of cloud native principles across the enterprise, beyond integration alone. Of course, almost by definition, fine-grained integration is well suited to this iterative approach.
Further reading
- Original “ebook” on agile integration
- “A cloud native approach to agile integration”
- Although written a while ago, the first four chapters of the IBM Redbook “Accelerating Modernization with Agile Integration” are architectural, and product agnostic so largely stand the test of time.
- The full list of publicly available material related to agile integration is here.
Acknowledgements
Specifically for their help in input and review on this piece I’d like to thank Callum Jackson and Claudio Tagliabue. However, on a broader note, it would be impossible to list all the people who have brought agile integration approach to life. A glance at the authors and contributors to the Redbook mentioned above will give you some idea. In there you will find everything from architects to technical specialists in the field, people in the labs working on products, and of course customers on the sharp end, implementing it in production. Sincere thanks for all their input as we continue to mature these ideas.