>block-level deduplication (saves 30-40% on typical codebases)
How is savings of 40% on a typical codebase possible with block-level deduplication? What kind of blocks are you talking about? Blocks as in the filesystem?
I am working to improve the CLI tools to make getting this information easier but I have stored the yam repo in yams with multiple snapshots and metadata tags and I am seeing about 32% storage savings.
I stored the codebase for yams in the tool. The "blocks" are content-defined blocks/chunks, not filesystem blocks. They're variable-size chunks (typically 4-64KB) created using Rabin fingerprinting to find natural content boundaries. This enables deduplication across files that share similar content.
How is savings of 40% on a typical codebase possible with block-level deduplication? What kind of blocks are you talking about? Blocks as in the filesystem?