Product : StorPool, StorPool [SDS]/19.01, x86
Feature : Data Locality, General, Storage Support
Content Owner:  Herman Rutten
Data locality is not used by default but is partially supported. In most cases it is statistically better to not use data locality due to the higher performance of the large pool available from the whole pool of drives in all servers. This can be configured on a per-volume basis.

Physical drives are grouped in one or more pools called placement groups. One disk can participate in more than one placement group. In the simplest configuration all disks reside in a single placement group. By default StorPool will distribute user data across all the disks in the cluster proportional to their size.

If for a particular volume it is preferred that data is stored only on a subset of the disks, then a separate placement group that includes the target disks only can be created and the volume can be configured to store one or all three copies of the data using this placement group.

Placement groups used by the volumes can be changed in realtime, which causes the data to be migrated from one set of disks to another in the background, while the volume is in use, and without a noticeable performance impact.

There is no automated mechanism that changes data locality based on the current usage because limiting the data only to a subset of disks usually doesnt add any performance benefits. However, such functionality can be achieved by external logic through the StorPool API to change the volume settings in realtime.

Whether data locality is a good or a bad thing has turned into a philosophical debate. Its true that data locality can prevent a lot of network traffic between nodes, because the data is physically located at the same node where the VM resides. However, in dynamic environments where VMs move to different hosts on a frequent basis, data locality in most cases requires a lot of data to be copied between nodes in order to maintain the physical VM-data relationship. The SDS/HCI vendors today that choose not to use data locality, advocate that the additional network latency is negligible.