>block-level deduplication (saves 30-40% on typical codebases) How is savings of...

blackmanta · 2025-08-14T14:44:04 1755182644

I am working to improve the CLI tools to make getting this information easier but I have stored the yam repo in yams with multiple snapshots and metadata tags and I am seeing about 32% storage savings.

elpocko · 2025-08-14T15:01:36 1755183696

Cool. I have no idea what "stored the yam repo in yams" means. What do you mean by "block-level deduplication"? What is a block?

blackmanta · 2025-08-14T15:10:26 1755184226

I stored the codebase for yams in the tool. The "blocks" are content-defined blocks/chunks, not filesystem blocks. They're variable-size chunks (typically 4-64KB) created using Rabin fingerprinting to find natural content boundaries. This enables deduplication across files that share similar content.