Iniciar sesión / Registrar Cuenta

The Problem

Recently, I worked with a customer in the courier industry whose core logistics scheduling application suffered from reliability, performance, and scalability issues. While I addressed the immediate problems, it was clear that the 20 years of patchwork fixes, enhancements, and technical debt accumulated were the root cause.

The legacy architecture consisted of four Red Hat JBoss Enterprise Application Platform servers, each with an instance of the logistics scheduling application deployed and an embedded Artemis broker to handle the messaging. When deploying a new version of the application, each node was brought offline. 

However, each deployment resulted in message loss due to the Artemis broker clustering being non-existent. Each message loss meant a loss in revenue. In addition to the lack of clustering for messaging, the JBoss EAP servers were not load balanced nor clustered, therefore each instance was a single point of failure.

The result of legacy architecture was a system that was constrained, unreliable, and impossible to scale. To release new code safely, the company had to schedule a planned outage during a low traffic period in order to minimize the impact to customers and revenue loss. In essence, a deployment cost money.

In the era of CI/CD, for a company to intentionally bring down mission critical applications is like burning money in a barrell. However, the organization was able to achieve a modern, performant, and highly available system by shedding the years of accumulated technical debt and introducing new concepts such as:

  • High availability and replication

  • Intelligent load balancing using mod_cluster 

  • Decoupling of Java Message Service ("JMS") servers and application servers 

To design a long-lasting solution and enable the organization to reach technical innovation goals, we had to address three very important questions:

  • What are we trying to achieve with this solution? 

  • What problems are we addressing?

  • Where are we going? 

It is easy to address the most obvious answer (the biggest area of pain and constraint) and move on. However, this is where many fail in their planning and this customer was no exception. 

Modernizing requires long-term solutions

If you only focus on the short-term solution and continue to address the problems of today with quick patchwork fixes without considering and planning for what's to come, then you may very well find yourself in the same situation. With an architecture that is out of date with a large amount of technical debt making it increasingly difficult to stay current with best practices and the latest architecture. 

The solution designed to address today’s problem must consider and fit into the long term IT roadmap to enable the technical growth and innovation of tomorrow. 

For example, in this case, the inability to release code regularly without taking a hard outage of the application was the result of poor planning which stunted the evolution of the architectural design and enablement. 

The logistics scheduling application is deployed on JBoss EAP and like many applications, it leveraged Java Messaging Service (JMS) to integrate different components through messaging.

 embedded Artemis architecture

The diagram above is of the embedded Artemis architecture.

Solution Overview

One might think that the fix is simple. Configure replication across the four Artemis brokers and call it a day. You would not be wrong in that this would address the immediate issue but would fail in the future due to a lack of long-term planning. This is a common mistake that many organizations make, but we at Axcelinno are here to help prevent our customers from making that mistake. 

That is why the question of "where are we going?" is of so much importance. 

This may in fact be the single most important question to ask yourself when designing a solution. In order to build a long-lasting and scalable solution, we must understand where we are going as an organization and what the goals of the company are and will be.

In the case of this particular customer, the roadmap consisted of:

  • Blue-green deployments

  • JBoss EAP version upgrade

  • Breaking down the monolithic architecture into microservices

  • Cloud migration 

To get there we must design a solution that will enable the team to achieve these goals. A configuration change to enable replication would not enable the customer to easily perform a JBoss EAP upgrade without the risk of application downtime because of the current embedded artemis architecture. It also would not enable the team to achieve a microservices architecture. This is a short-sighted and fragmented solution.

A holistic approach for today and tomorrow

At Axcelinno it is our objective to design a holistic solution that addresses the issues, both immediate and future.

For this customer, a standalone Red Hat AMQ (AMQ) cluster was implemented in an active-passive architecture. By decoupling the application tier from the messaging tier, the organization was able to achieve a scalable and fault-tolerant JMS environment that is not prone to outages and is architecturally superior to the previous embedded solution, eliminating the requirement of manually copying the message journals from one server group to the other. This also provided the ability to perform JBoss EAP upgrades without affecting the messaging system. 

This allowed for a higher level of flexibility to be achieved. In addition to the new found level of flexibility the organization’s newly deployed decoupled architecture provided other additional benefits. 

  • The JMS messages can now be consumed by any JBoss EAP 7 consumer. Thus, eliminating the question of "What if a server does not come back up when messages are persisted?"

  • Changes to the application tier and subsequent application deployments are now simpler and do not require the maintenance of two server groups. 

  • Changes can now be made to AMQ independently of the application. 

  • The standalone architecture provides greater flexibility and options in regard to the clustering architecture. The embedded Artemis architecture supports live/back up only. The standalone architecture allows for multiple clustering options (symmetric or chain) for future flexibility. 

Decoupling the application server from the JMS server enables the organization to deploy code releases without taking a hard outage of the application because there is no longer message loss. However, there are still areas to be addressed. During deployment, there will be a period of time where the node will not be available. This creates new issues to be addressed.  

Specifically, if a user is on the node when the deployment of the new code is taking place, they will lose their current session and all work up to that point will be lost and the application will not be accessible. 

To address this, and enable the ability to achieve blue-green deployments, mod_cluster was implemented along with session replication. Session replication allows for users to be migrated (without the user knowing it) from one node to another in the case that the node the user is actively on is taken offline. Therefore, the user is not required to log in again, and no active work is lost in the process. 

In addition to session replication, the role of mod_cluster is to act as an intelligent load balancer. In this case, directing traffic to only active nodes, and allowing the system administrator to designate the node to be removed from the cluster, thus providing the ability to drain sessions off one node in a controlled fashion. All new users will not be assigned to the node that is being removed from the cluster until the new code is deployed and the node is reintroduced into the cluster.

The diagram above is of the standalone Active-MQ architecture.

Conclusion

Addressing the accumulated technical debt by updating to the latest supported JBoss EAP version and implementing a modernized holistic solution and decoupling Red Hat AMQ from the application server achieved an architecture that enabled the customer to continue to grow and innovate their IT infrastructure. 

Axcelinno excels at designing solutions today that will enable the future and we do it in weeks not years. It is our goal to design a solution that enables our customers to succeed today and tomorrow.


About the author

Matthew Hughes is an experienced senior solutions consultant at Axcelinno and has helped transform companies by focusing on a holistic solution that incorporates people, processes, and tools for a long-lasting and scalable solution.