Skip to main content

Module chunker

Module chunker 

Source
Expand description

Content-defined chunking (CDC) for Parquet data pages.

CDC creates data page boundaries based on content rather than fixed sizes, enabling efficient deduplication in content-addressable storage (CAS) systems. See CdcOptions for configuration.

Modulesยง

cdc ๐Ÿ”’
cdc_generated ๐Ÿ”’

Structsยง

CdcChunk ๐Ÿ”’
A chunk of data with level and value offsets for record-shredded nested data.