High availability
The key to high availability is Redundency
Hot swapping is redirecting your traffic from broken component to a healthy component
You have two options if you want to do hot swapping to a stateful component:
- Ignore in flight requests and let users request again
- Have a synched copy of component and change the leader
Leader selection is combination of hot swapping and redundency. One of the replicas is selected as a leader which does main operations.
We should also have Idempotency since network issues or retries always happen, you need to make sure redoing an action does not break anything or A broken component may still do the work done by the healthy component
You should usually have at least 3 replicas of etcd. You can make them know each other by sending data from an existing cluster or using etcd discovery
The ideal status is having Zero downtime