What are Microservices on AKS
We see a lot of talks online and with customers about Microservices and Service Mesh, but what are they?
In this article, we will look into the following topics
- What does Microservices mean?
- What is a Service Mesh?
- What can you do with a Service Mesh?
- Do you need a Service Mesh,
- Can you use Service Mesh with AKS?
What is Microservices?
Microservices is actually an architectural style that allows you to structure your application as a collection of services. Usually, each service is a small application (but not too small) that is owned by a small team. The application is designed to be loosely coupled, this basically means that the application should be able to be deployed independently and not bring your system down. By using Microservices architecture coupled with DevOps practices you can do rapid, frequent, and reliable releases of complex applications.
Microservices is not a silver bullet to your needs, in fact, it needs careful consideration before you go down the Microservices route. Some of the pitfalls or cons of Microservices relate to communication between services and external resources, think latency, and security. Developers may need to write extra code to help deal with this. Another pitfall or con would be that the more microservices you have the more resources need to be looked after and developed. Debugging is another challenge. But do not let all the cons distract you from the positives. Just make sure before you start the journey to microservices that it is right for you.
What is a Service Mesh?
A service mesh is a small tool that helps bring observability, security, and reliability to your microservices applications by injecting itself into the platform layer rather than the application layer. With most service meshes a proxy container is added to your applications pods. This is normally called a sidecar container. The main role of a service mesh, as you might have guessed by the usage of a proxy, is to manage network traffic between services.
There are currently several different Service Meshes available. The most popular is Istio (the Greek word for sail). Istio was created by teams at Google and IBM in partnership with the Envoy team from Lyft and is being developed in the open on GitHub. Another popular service mesh is Linkerd, which states to be the “World’s lightest, fastest service mesh”. Again, this is developed in the open on GitHub.
A few companies are also creating service meshes including Microsoft with their Open Service Mesh (OSM). OSM is very new and still in Alpha at the time of writing. Nginx is working on its own service mesh called NGINX Service Mesh (NSM). NSM is developed behind closed doors but issues can be logged via GitHub. Here you can find a nice comparison of the Service Meshes can be found.
With the number of choices available regarding service meshes I highly recommend looking at each option carefully and trialing them before you pick one. Oh, and do not forget to keep an eye on the projects as with anything related to Kubernetes, they get updated a lot, so breaking changes can happen.
What can you do with a Service Mesh?
Below you will find a few things that a service mesh can do.
Dynamic Service Discovery and Routing
By using a service mesh, you can get dynamic service discovery and traffic management. This will allow you to perform traffic shadowing, think duplication, very handy for testing. For example, you might have an application that communicates to a service, you are planning on releasing a new version of this service but want to test what would happen with live data. By using traffic shadowing, you can duplicate this live traffic to your production service and your new, in test service, to see how it behaves.
Another feature is traffic splitting. This makes canary testing easy. Let us use the example from above but now the application has finished testing and we want to roll it out, but not to everyone just yet. With traffic splitting, we can split the traffic between the two services, say 70% to the old service and 30% to the new service. We can then monitor the new service for any errors or issues and continue to increase the percentage of load to 100 when we are confident the new service is working as expected. You can also use this for A/B type experimentation for new features to say a web application.
Service-to-Service Communication Reliability
The primary function of a service mesh is to manage service-to-service communication, due to this service meshes give you the option to implement features like request retries, timeouts, rate limiting, and circuit-breaking.
As mentioned above with most service mesh installations a proxy server sidecar container is added to each pod in your cluster. These sidecars are controlled by the service mesh control plane. Once all proxies are configured you have the data plane. This data plane allows you to enable smart routing, think latency-aware load balancing, and implement routing rules based on request properties. One thing to not all service meshes are the same so you may find this option is not in the service mesh you picked.
By using options like timeouts and circuit breakers you can ensure your services do not get huge backlogs or deliver a bad user experience. I would definitely spend some time investigating this and working with the business to fit your business requirements.
Observability of Traffic
Due to all service to service communication going via the proxies in a service mesh you have the ability for improved observability of the network traffic. This will allow you to trace a request via all services in the service mesh, the frequency of HTTP error codes, and any latency, be it service-to-service or globally.
You may find some service meshes have their own dashboard to give you a single pane of glass view of all network flow, but they mostly also offer you the option to ship the logs to Prometheus to be used with Grafana, the two most used tools for monitoring AKS.
I know it is last on the list, but it is probably the reason why most people adopt a service mesh. Most service meshes allow you to control what service can talk to what service. So “Service A” can communicate to “Service C”, but not “Service B”, and “Service B can only talk to “Service C”. By default, AKS does have networking policies but they are controlled via manifest files and can be cumbersome.
Now, this next feature is the reason why people choose a service mesh, encrypted traffic. This means all internal network traffic is encrypted using certificates. These certificates could even be from an external CA like let’s encrypt, or from Azure Key vault is supported by the service mesh.
Do you need a Service Mesh?
If you are running a few services that connect to say an event grid or message queue, then you probably do not need a service mesh. That is unless you need any of the features above, like inter-cluster communication over HTTPS.
If you have multiple services talking to each other then a service mesh might be for you. By using a service mesh, you also allow your developers to focus on the business value of your application rather than connecting the services.
Can you use Service Mesh with AKS?
Service meshes are 100% compatible with AKS. AKS runs just like any Kubernetes installation, be it on-premises, in another cloud, or on Virtual machines. The benefit of pairing a service mesh with AKS is you do not have to also manage the underlying systems. Microsoft looks after that for you. This means you have more time to develop your application and investigate whether Service mesh is for you.
With Microsoft creating their own service mesh (OSM) I would always keep an eye on how it progresses as it will probably end up being extremely compatible with AKS and maybe even super easy to deploy as an addon.