Deduplication can be performed in 4 ways:
1. Immediately when the write is processed (inline) and before the write is ackowledged back to the originator of the write (pre-ack).
2. Immediately when the write is processed (inline) and in parallel to the write being acknowledged back to the originator of the write (on-ack).
3. A short time after the write is processed (inline) so after the write is acknowleged back to the originator of the write - eg. when flushing the write buffer to persistent storage (post-ack)
4. After the write has been committed to the persistent storage layer (post-process).
The first and second methods, when properly integrated into the solution, are most likely to offer both performance and capacity benefits. The third and fourth methods are primarily used for capacity benefits only.
StarWind Virtual SAN for vSphere inline deduplication works in the following manner:
1. In the initial phase, any blocks that consist entirely of zeros are identified and recorded only in metadata.
2. In the second phase, the incoming data is processed to determine whether it is redundant data (data that has been written before) or not. The redundancy of this data is checked through metadata maintained by the kernel module. Any block of data that is found to be redundant will not be written out. Instead, metadata will be updated to point to the original copy of the block already stored on media.
3. Once the initial and second phases are completed, compression is applied to the remaining individual data blocks. The compressed data blocks are then packed together into fixed length (4KB) blocks and stored on media.
Windows Server 2019 deduplication is performed outside of IO path (post-processing) and is multi-threaded to speed up processing and keep performance impact minimal.