Node failure is not a critical event in StorPool Distributed Storage (StorPool) when using multiple copies/replicas (3N or 2N) for data protection. A node failure does not cause downtime or even partial unavailability. The system is self-healing: the StorPool cluster rebuilds only the changed/missing data when the failed node returns or just creates a new copy of the missing data when the failed node is not back within a pre-set time (eg. 5 minutes as most failures are transient).
StorPool uses replicas to guarantee data redundancy.
StorPools implementation of replicas is called Copies:
- Maintaining 1 copy/replica (1N) means that data is kept only once and is not protected by another copy/replica.
- Maintaining 2 copies/replicas (2N) means that data is protected by writing 2 copies of the data to the StorPool cluster. Protection applies to both disk and node failures.
- Maintaining 3 copies/replicas (3N) means that data is protected by writing 3 copies of the data to the StorPool cluster. Protection applies to both disk and node failures.
StorPool recommends using 3 copies/replicas as a standard and using 2 copies/replicas for data that is less critical. Using the standard (3N) means that the StorPool Distributed Storage (StorPool) platform can withstand a failure of any two disks or any two nodes within the storage cluster.
Before any write is acknowledged to the host, it is synchronously replicated to the prescribed number of nodes. All nodes in the cluster participate in replication. This means that with 3N one instance of data that is written is stored on one node and other instances of that data are stored on two different nodes in the cluster. For all instances this happens in a fully distributed manner, in other words, there is no dedicated partner node. When a disk fails, it is marked offline and data is read from another instance instead. At the same time data re-replication of the associated copies/replicas is initiated in order to restore the desired number of copies/replicas.
StorPool, StorPool [SDS]/19.01, x86
Node Failure Protection, Reads/Writes, Data Availability