Microservices are small: they have only a few responsibilities.
The more microservices you use the more complex the system is.
The microservice journey starts simple
- A few services communicate with each other and their databases
- Running locally is still simple
- The whole architecture is transparent
- Onboarding new developers is straightforward
- The communication is usually synchronous HTTP
After extending your architecture with more features you’ll end up with more microservices and the communication among them gets complicated.
Problems and solutions
- Hard to see the whole picture → create a descriptive architecture document (but it easily becomes outdated)
- Hard to know the payload/data structure between services → you can use OAS documentation to document the API usage, but when it’s manually generated, you have to think how accurate it is
- Running it locally is a challenge → using cloud services works quite well to a certain extent
HTTP vs. async communication (Kafka)
HTTP is good when the entire environment is up and running; but you can end up with data loss when some services become unavailable
- HTTP is not enough, because it’s synchronous and isn’t scalable enough
- You need a solution that makes your application as operational as possible without losing any data when one of the components of your architecture isn’t working
- Solution: moving to async communication (e.g., Pub/Sub, Distributed Queue, Kafka)
- There are many tools available, you have to pick the right one for you
- Using async communication tools: services won’t communicate with each other directly, they’ll communicate through a tool instead (e.g., Kafka)
- Service A sends data to Kafka → Kafka persists the data and ensures it won’t get lost
- Service B will pull the data from Kafka when it’s available
- It makes the architecture a bit more complicated, but not too complicated and the data is persistent
Downsides of async
- Swagger/OAS can’t be used because it describes HTTP communication
- You need to find other solutions to visualize communication → there are no community adopted mature solutions yet
- Running services locally is complicated and requires a creative solution (e.g., running Kafka locally)
- Breaking API changes: it depends on the working environment → docker compose, contained environments you can control
- How to debug a service which gets data from another source? (can’t send an API request via Postman)
- The whole picture got way more complicated as you start using more microservices and async communication
Tools to handle complexity
Distributed tracing – the ability to trace how microservices communicate with one another.
- An OS tool allows you to instrument distributed traces in microservices
- It instruments what data comes into a service and what data goes out to a service within the code
- Everything is being sent to a central location
Distributed tracing tools: Jaeger UI and Zipkin
For a developer, the ability to inspect and understand data and visualize the flow is very important.
Distributed tracing tools can help
- Visualizing the flow
- Debugging (trace ID)
- Understanding the big picture
- Understanding the outcome for a particular endpoint
- Understanding what components are involved
- Inspecting the relation between the components
- Monitoring what actions were done during the API call.
- Writing logs
If you don’t know how the line of code was executed, what endpoint executed it, you can grab the trace ID, paste it into Jaeger UI or Zipkin and search on that, so the entire flow of microservices becomes visible → seeing the bigger picture has a direct impact on reducing the amount of bugs within production.
Distributed tracing tools won’t help you with:
- Local development (which is a problem in this field)
- Deciding which test you need to write (but it worths the investment)
- Reproducing an issue (it provides the flow but not the actual data to reproduce a specific flow)
- Microservices are complicated by definition and they get more and more complicated if you use more of them
- The roles of the developers is to predict the complexities, be ready for that and bring the right tools to overcome the issues
- Be aware that the implemented solutions are mostly programming language specific
- Use HTTP where you have to, but use asynchronous communication wherever you can (better and safer, but it makes development a bit more complicated)