Taking shortcuts is very human. If you don’t believe me, just look at the desire path shown in the image provided below. Would you not take it?
If you wonder what a desire path is, this is another word for shortcut. For instance, you might take a shortcut by walking over a piece of grass to save yourself some time. We are inclined to take shortcuts. To speed things up or make things easier. It is in our nature. In Dutch this is called an elephant path, because elephants are naturally programmed to take the shortest route when travelling. Desire paths as a phenomenon are well documented and even discussed on the TED stage.
Taking shortcuts might look tempting. It can save you time, but it can also cost you! Now if you wonder what I am talking about, I am talking about taking shortcuts when you are developing software integrations. What you might skip at the beginning of developing integrations, in many cases, takes more time (and therefore money) to develop later on in the process. So, Caveat Emptor!
The right and wrong way of software developing
When you are developing a new piece of code, you can do this in two ways. The right way and the wrong way. The right way is something that is debatable. And to some extent, it also depends on your opinion about software development coding, etc. But it also depends on aspects of conventions inside the organization that you work in.
The wrong way is also debatable. In the sense that there are some cases where quickly showcasing a piece of code would actually suffice. But those quick approaches are often not showing you anything that can prove or disprove a concept. They are often just a disposable early-stage prototype. It might look like a big deal, but in reality, it is barebones and not intended for anything beyond a quick demo.
Minimal Viable Integration (MVI)
In this blog, I would like to talk about something important. Something that we at Yenlo refer to as a minimal viable integration (MVI).
As far as we are concerned, there is a minimal standard that you should adhere to when you are doing integration with the WSO2 products. You do not want to compromise on security, for instance. But there is more.
The basics and risks of software developing
Of course, we all know the basics. Do not duplicate code, endpoints, and other development artefacts. Copying code is a knee jerk reaction and we need to learn how to control this. Do not mindlessly duplicate code. This should have obviously been the fundamental part of all 101 courses on software development or configuration. If you still duplicate code, you might end up having a nightmare of maintenance in your hands. Also, the possibility of errors is much, much bigger than it is with a normal use of a function, a library, or some other form of reusing code. Rather create separate functions, services, or templates to capture reusable functionality.
Think before you do!
Yes, you must think things through if you want to maximize reuse and reusability. Resist the pressure to deliver something quick and dirty. Think before you act, and even discussing it with peers is a virtue which will benefit your team and your stakeholders. But it does require more time.
On the other side of the spectrum, going all the way to the other extreme can become paralyzing. This is called analysis paralysis. It is commonly caused by failing to limit your thinking to a realistic problem scope. You should not be pushing boundaries with regards to complexity beyond your bounded context. Especially when you start manipulating stuff on a lower level (e.g., changing standard components), it becomes more and more difficult for you to understand how people are using your work and for peers to grasp the changes that you have made. Keep in mind, not everyone is as brilliant as you are.
Seven Steps to a Minimal Viable Integration
So, what is this minimal viable integration that we are talking about? Well, it makes sure that the piece of code that you are developing will be fit for purpose. In addition, it can reveal things that might not initially be stated or demanded but could add significant value in the long run. Hence my prior assertion that taking shortcuts will cost you. Following these seven steps will require more of your time but a MVI based approach is necessary if you want to improve the quality of your integrations and minimize the impact of potential errors. Just like seatbelts in cars can help minimize injuries when something goes wrong. The fact that they are required by law shows their importance.
Below I will explain the seven different practices to a minimal viable integration in more detail, along with an indication of the steps you can take.
#1 Make sure to cover all situations
One of those things is the selection of the appropriate mediator. In many cases, there are multiple options to implement integration logic with mediators. The Filter Mediator, for instance, uses a binary IF THEN ELSE that suffices for situations where you have a choice between two. But is there not a third option?
Imagine you have two departments that need to receive messages based on information on the message. You could say that if the message is not for department #1, it automatically is for department #2. But what if a message comes in that is in error and should not go to #1 or #2? The binary nature of the filter will send it to #1 or #2 depending on the setup, and not handle it correctly. So, you must introduce an additional filter. Before you know it, you create a messy sequence.
In reality there is hardly ever a binary problem. There are almost always exceptions to the binary rule. Or at least potential future exceptions. Now, by simply using the switch mediator from the start, your initial implementation will not bring you significantly more work, but it will instantly improve the maintainability of your code.
→ Action to take: Switch to switch.
#2 Process your message prudently
One of the first lessons you learn in integration is to avoid keeping state in an integration service. For one, this will limit your scalability. You will also risk data loss during fail-over. A-B testing and Canary deployments become more complex and zero-downtime upgrades may become impossible.
In practice, it is surprisingly difficult to avoid keeping state, especially in integration services that (may) take longer to process. There are a couple of ways to make such integrations more robust. Do not use local storage. Ever. Rather use a distributed object store or a network drive to store your files. Use queues whenever you are communicating over an unreliable channel. Remember, with every retry you rely on your message to still be available. Do not expect your in-memory message to be persistent. Then there are the intermediate states. If a message path is built from a sequence of steps that are not repeatable, you should keep track of every step. Consider building an event pipeline to track message processing. Some event brokers even offer the option to restart processing exactly from the point where things went wrong.
Event channels come with an additional bonus. Like topics, they facilitate extensibility. Whenever you need to add a destination to the event pipeline, you can simply add a subscription to a topic. Presto!
→ Action to take: Choose a Message Queue solution, consider an event pipeline
→ Solutions: ActiveMQ, RabbitMQ (Queues), Kafka
#3 Invest in Observability
No matter how hard you try to get everything right, things will go wrong. Integration is an inherently distributed endeavor and communication paths are not designed for reliability. Packets will get dropped, cables will break, ports in a switch may falter, network congestion may slow things down, quota can get exceeded and so on. All of these may be rare, but when you operate at scale, their occurrence may surprise you. Obviously, you do not want to spend time analyzing what went wrong in integration when the issue was in the network. On the other hand, assuming glitches are caused by the network while in fact your integration is failing, is perhaps in certain edge cases even worse.
To be on top of integration issues, you must invest in observability. Collect log files from all the nodes in the processing pipeline. Register cross-correlations. Aggregate statistics for your long-term trend analysis. Spend some time on proper use of the log mediator. Build competence to analyze your log data quickly when needed. And plan time to inspect your log data regularly. After all, prevention is always better than a heroic cure.
→ Action to take: Investigate a log solution
#4 Practice Modularization
Whenever you are tempted to order a giant monitor to help you manage complex sequences, you should probably think about breaking them up in manageable pieces. Preferably break them up into chunks that can be reused in later integrations. There is a fine line not to cross, though. Modular complexity can become a challenge in and of itself. You may risk creating infinite loops through circular dependencies. Similarly, with complex event streams, you may cause an event storm which floods the stream. Fortunately, new design tools are entering the market to help you keep on top of your topics.
→ Action to take: Analyze code to determine where and when modulization should be implemented keeping an eye on performance and maintainability
→ Solutions: Not applicable
#5 Deliver dead letters
When all else fails, you might be tempted to simply drop a faulty message and close the case. Often, however, this is not the expected behavior. Whenever there is an unrecoverable problem in processing an incoming message or file – it contains some illegal characters, has too many characters in a certain parameter, omitted a mandatory item, has an inconsistent header or footer, and value outside in the list of predefined values, an array out of bounds – typically someone must go in and fix it. Sometimes, applications are built to handle scrap messages or automatically correct expected inconsistencies. If not, pushing a failing message to a dead-letter queue is the measure of last resort. This should alert an operator to go in and fix it.
Be aware though, that message failure may have a ripple effect. If maintaining order in a processing chain is important, a failure should stop the entire chain from being processed. Especially in batch processing, thorough validation of the entire batch may be a prerequisite for processing to start.
→ Action to take: Dust of your knowledge of Enterprise Integration Patterns and research message queues
→ Solutions: Regular message queues, incident handling
#6 Develop as an optimist, test as a pessimist
When it comes to testing, you should not take anything for granted. For starters, functional testing must be complete. Every edge case you can think of should be included in your test scripts. Not only tested against the stubs you have created, but also against the actual systems you are integrating with. No excuse. And because integration is inherently error-prone, you should also consider inserting some chaos in your testing.
I know, thorough testing can be a lot of work. This is where tooling can be tremendously helpful. It saves a lot of time and increases the quality of testing. Take for instance an automated conformance scan of your APIs. Or a fault injection simulator. Remember, the more you can shift your issues left, and sniff them out before they become problematic, the easier they are to solve. At the same time, once you can convincingly demonstrate that an issue indeed has a limited impact, you do not have to waste time fixing it for your minimum viable integration.
→ Action to take: Create extensive testing scripts
→ Solutions: Testing tools like Rest Assured, Postman, SoapUI but also testing suites
#7 Avoid shortening the circuit
Whenever a service fails, sending it more messages might worsen the problem. Even retrying after a timeout might be problematic. On the other hand, a single time-out is no indication a service is failing. It can be challenging to establish such behavior, especially in large, distributed systems. That is when you need a smart adapter to these services from becoming flooded. A single point in your architecture that channels all traffic to a back-end and can gracefully handle temporary integration issues. An API Gateway, for instance.
A circuit breaker is an advanced algorithm to gracefully handle temporary integration issues. It is smart enough to break the circuit to a failing node. It is also smart enough to automatically reconnect whenever it is responsible to do so. In fact, it is an interactive throttling mechanism tuned to the health of an endpoint. Especially when your minimum viable integration relies on third party components or services that are not yet battle-tested, a circuit breaker might save the day.
→ Action to take: Implement a circuit breaker in your API
→ Solutions: WSO2 API Microgateway, Istio
Take your time and hurry
These 7 steps will make sure that you are not taking shortcuts you are going to regret. A shortcut might get you somewhere quicker, but not in a solid state. All the gain you got from taking the shortcut evaporates because fragile solutions tend to hurt the ones using those and bite you back when you least expect it.