Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>block-level deduplication (saves 30-40% on typical codebases)

How is savings of 40% on a typical codebase possible with block-level deduplication? What kind of blocks are you talking about? Blocks as in the filesystem?



I am working to improve the CLI tools to make getting this information easier but I have stored the yam repo in yams with multiple snapshots and metadata tags and I am seeing about 32% storage savings.


Cool. I have no idea what "stored the yam repo in yams" means. What do you mean by "block-level deduplication"? What is a block?


I stored the codebase for yams in the tool. The "blocks" are content-defined blocks/chunks, not filesystem blocks. They're variable-size chunks (typically 4-64KB) created using Rabin fingerprinting to find natural content boundaries. This enables deduplication across files that share similar content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: