I have verified the fact via ghtorrent [1] when the previous commit (with 12 zeroes) was introduced (Korean tweet with a simple script [2]). However I haven't verified the current commit yet, hence the "cursory" analysis.
On the git commit id collision, I don't know about the actual mitigation deployed in Github but they are certainly aware of the possibility [3].
[1] http://ghtorrent.org/ (Note that what I really used is a stripped down version with only commit ids, so this also contributes to a word "cursory" in my cursory analysis.)
I must be missing something... how is higher (or lower) hash getting exponentially harder?
For each extra leading 0, there are 1/16 as many hashes available as the previous one - so that is exponentially harder.
But higher could be just by 1, and which is almost exactly the same difficulty... I guess, it would usually be randomly distribited in the remaining range, uniformly, so on average, the range would 1/2 as many each time. Which is also exponential (though, not as fast as 1/16). Is that the reasoning?
For growing hashes, we can limit the growth of the search complexity by choosing to accept a more constrained range of hashes in each iteration.
Suppose H is an L-bit hash function and let H_n be the commit hash at the nth iteration. If we seek to generate H_n subject only to the constraint H_n+1 > H_n, the search complexity grows exponentially as you describe above.
But suppose we search subject to the additional constraint H_n < H_n+1 < min(H_n + k, 2^L) for some constant k. So long as 2^L - H_n > k, the search complexity for H_n+1 is only dependent on the constant k and the number of bits of the hash function, L.
Heh, you made me consider if kudeki-chain also restricted the range (though not to a constant):
One could read the title as exactly n leading zeros, but of course, it just means it has n leading zeros (that is, it may have more, too) - and the py code is implemented that way. So, for example, if you started with 16 leading zeros, it could be used for the first 16 commits.
work = len(sha1) - len(sha1.lstrip('0'))
...
assert commit_num <= work, "Not enough work done."
I think it would converge to a similar value rather quickly. Someone with access to fast hashing would vastly overtake the crowd and keep his #1 spot until a bigger fish appears.
This is similar in concept to how Bitcoin defines its hardness, though it uses a different hash and block structure of course. I wonder if successful commits in this repo are coming from that world or maybe from SHA1 cryptographic weaknesses.
I have been bringing up this point in conversations about blockchain for a while now. Git is essentially a chain of commits, and the hash of a commit will change based on what you're applying it on by including the previous commit's hash in the body that gets eventually hashed into HEAD.
Also reminds me of that time back when we were using 8-char truncated git hashes as deployment tags and we eventually begun getting collisions. We also had an interesting bug when all 8 chars were numeric and Python (or the framework/library that was used) ingested what would be a string it as a number.
The difference between blockhain and git is the hashing function. Git’s is designed for speed and constant difficulty, blockhain is designed for slowness and increasing difficulty. The logical meaning of each block is different also.
Python is mostly strongly typed. You can get a mix of floats and ints from e.g. parsing JSON, but your library's authors must have been pretty explicit on wanting to confuse strings with numbers.
I know, I was surprised too. But I guess if it comes in as a number from the get-go, it will simply continue as a number until we tried to slice it or something, which made the deployer abort. :)
I ran a computer and network security course back from 2010-2013 and the simplified version of this was part of our wargames. The difficulty scaled with recency of the last commit.
Admittedly we had to minimize the number of points awarded for this as it'd be a "money for marks" conversion otherwise but it was fun enough to prove a concept.
Nth commit needing N leading zeros ups the ante a fairly hilarious amount ;)
(From my very limited understanding) making commits requires you to find a specific hash, thus making new commits exponentially harder over time.
At some point it would cost significant money (besides the developers time) to make commits. It's basically an economic game like bitcoin, that those people who have economic interest in the code, get to decide what's in it.
HNers with long memories may remember the very similar XKCD "Alma mater" challenge from 2013[1][2] (which is no longer active). There is a sort of continuance of that online at https://beatthehash.com/ .
The goal of that challenge was similar: find some input string by brute force, such that the Skein-1024( input ) XOR a provided "target" has more zero bits than anyone else. (At the time, Skein was a SHA-3 finalist.)
Git commits must be marginally structured, whereas the Skein challenge input was arbritrary, but that doesn't change the problem much.
[1] https://github.com/seungwonpark/kudeki-chain/commit/00000000...