Deduplication can be performed in 4 ways:
1. Immediately when the write is processed (inline) and before the write is ackowledged back to the originator of the write (pre-ack).
2. Immediately when the write is processed (inline) and in parallel to the write being acknowledged back to the originator of the write (on-ack).
3. A short time after the write is processed (inline) so after the write is acknowleged back to the originator of the write - eg. when flushing the write buffer to persistent storage (post-ack)
4. After the write has been committed to the persistent storage layer (post-process).
The first and second methods, when properly integrated into the solution, are most likely to offer both performance and capacity benefits. The third and fourth methods are primarily used for capacity benefits only.
Datrium DVX leverages global inline deduplication as well as global inline compression techniques. Incoming writes from VMs are broken into 4K blocks, assigned a hash (crypto-hash) by the Datrium Compute Node and checked for duplicates, if not compressed inline and passed through by the DVX Hyperdriver on the Datrium DVX Compute Node to the mirrored NVRAM in a Datrium DVX Data Node. As soon as writing to NVRAM completes, an acknowledgement is sent back to the VM where the write originated. Writes are collected in 8MB containers. Once full a container is divided into 1MB Erasure Coding chuncks and striped sequentially across all disks (SSD or HDD) in the Data Node cluster, including the additional parity blocks that are being calculated.