High availability

The key to high availability is Redundency
Hot swapping is redirecting your traffic from broken component to a healthy component
You have two options if you want to do hot swapping to a stateful component:

  1. Ignore in flight requests and let users request again
  2. Have a synched copy of component and change the leader
    Leader selection is combination of hot swapping and redundency. One of the replicas is selected as a leader which does main operations.
    We should also have Idempotency since network issues or retries always happen, you need to make sure redoing an action does not break anything or A broken component may still do the work done by the healthy component

You should usually have at least 3 replicas of etcd. You can make them know each other by sending data from an existing cluster or using etcd discovery

The ideal status is having Zero downtime