Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Took me a while to realise that it is not actually validating whether a given number is a prime number or not.

It is validating whether a string of consecutive 1's has a prime length.

In other words, it can't tell whether "3" is prime, unless you express it as "111".

Much less exciting.



Sorry, but how is this different than pointing to three apples and explaining someone that this is what three is? Would you say it's not three because they are apples? I think it does not really matter how you represent the number. After all, all you see on your screen is "fake" it's just binary under the hood.


The problem with the unary numeral system is that numerals grow linearly with value, while for binary (ternary etc) they only grow logarithmically. So they need much less space and can be much faster read and written. An algorithm with relies on unary numerals probably has worse computational complexity than one that uses binary or higher.


Actually, an algorithm working on unary input tends to have better computational complexity than an algorithm working on binary input: an algorithm that is polynomial in the numeric value will be a polynomial algorithm on unary input, but an exponential algorithm on binary input. Such algorithms are usually called pseudo-polynomial[0]; indeed, this kind of primality testing is pseudo-polynomial.

[0] https://en.wikipedia.org/wiki/Pseudo-polynomial_time


> an algorithm that is polynomial in the numeric value will be a polynomial algorithm on unary input, but an exponential algorithm on binary input.

That's comparing apples to oranges, because the two input types have vastly different lengths relative to their value. Unary input length itself grows exponentially with binary input length, for the same numeric value, cancelling out the unary advantage. So unary isn't faster than binary.

In fact, I think it's pretty clear that unary representation is slower and takes more space. Just think about the rough number of steps you would need to add or even multiply two very large numbers, e.g. in a Turing machine. It would be obviously vastly more if the numbers are given in unary rather than in binary.


> That's comparing apples to oranges, because the two input types have vastly different lengths relative to their value.

It really is not, because complexity is fundamentally a function of _the length of the input_ (specifically: it measures how the runtime (or space usage) grows with growing input). If you have longer input, your Turing machine can spend more time to compute its answer. Also note that this assumes that _the input_ is encoded in unary, if you get binary input and need to spend time and space to convert it to unary representation, sure, that will lead to an exponential blow-up.

Edit: > Just think about the rough number of steps you would need to add or even multiply two very large numbers, e.g. in a Turing machine. It would be obviously vastly more if the numbers are given in unary rather than in binary.

First, note that those are not actually pseudo-polynomial, as the number of steps needed for addition (or multiplication) depend on the number of digits, not the numeric values involved. Yet, even here unary encoding _does not have worse complexity_, since you still take polynomially many steps in the length of the input (i.e., the length of the unary encodings of the input values). Yes, all the inputs will be exponentially larger, so in practice it's certainly not the better algorithm, but _the complexity_ is not worse, since that's only concerned with the asymptotic behaviour.


When we deal with algorithms involving numbers, we are interested in actual numbers, not in their numeral representation. It would be absurd to compare unary and binary addition relative to the same numeral length. We are ultimately interested in numbers, not numerals. Adding one million to one million is simply much faster for binary addition than for unary addition, even if unary addition has better complexity relative to its own encoding than binary addition has to its encoding. We are interested in complexity relative to the numerical value, not in the length of their respective encodings.

Edit: By the way, as the Wikipedia piece notices, the binary addition algorithm is O(log(n)) in time _relative to the value_, while unary addition is presumably O(n) relative to the value (just writing the numeral out alone takes O(n) steps). So the time complexity is in fact better. Probably the same holds for space complexity. Moreover, only in this case makes a comparison even sense, since we are comparing the same values in both cases, while for the numeral case we would be comparing the lengths of two different types of numerals, apples to oranges.


> It would be absurd to compare unary and binary addition relative to the same numeral length.

Why? The Turing machine has no concept of numeric values, it only knows about the length of whatever the input is.

> Adding one million to one million is simply much faster for binary addition than for unary addition, even if unary addition has better complexity relative to its own encoding than binary addition has to its encoding.

From a complexity standpoint, adding one million to one million is O(1), irregardless of the encoding.

> We are interested in complexity relative to the numerical value, not in the length of their respective encodings.

But the complexity relative to the numerical value is not even well-defined, since it _depends_ on the choice of the encoding.

> By the way, as the Wikipedia piece notices, the binary addition algorithm is O(log(n)) in time _relative to the value_, while unary addition is presumably O(n) relative to the value (just writing the numeral out alone takes O(n) steps). So the time complexity is in fact better.

It really is not better. Time complexity is _always_ measured relative to the length of the input, and for binary encoding, the length of the input is O(log(n)), so, taking O(log(n)) steps for the addition is linear, same as the linear time needed for adding numbers in unary encoding (which is basically just copying the input to the output).

> Moreover, only in this case makes a comparison even sense, since we are comparing the same values in both cases, while for the numeral case we would be comparing the lengths of two different types of numerals, apples to oranges.

But _the numeral is not the input_, its _encoding_ is. This is really the whole point, and it is _precisely_ because you get different complexities for different encodings.


> Why? The Turing machine has no concept of numeric values, it only knows about the length of whatever the input is.

That's irrelevant. We are interested in addition, or multiplication, or whatever, which is an operation between numbers (values), and it can be faster or slower depending on which numerals (encoding) are used.

> But the complexity relative to the numerical value is not even well-defined, since it _depends_ on the choice of the encoding.

Yes it depends on the encoding, no it is of course well-defined. You can measure two different algorithms which use two different encodings, relative to the same thing, the value.

> complexity is _always_ measured relative to the length of the input

That's simply not true. There is even a Wikipedia article about it. Even if it calls it "pseudo".

> But _the numeral is not the input_, its _encoding_ is

A numeral is an encoding of a number.


But indeed that isn't what three is, three is the concept that comes from saying "these are three apples" and then "these are three pears" and then "these are three cars" and then saying "three" is the conceptual thing that's in common here.

And here, validating a string of ones is simply not what anyone would mean if you said "check whether this number is a prime number" without any qualification, so I agree that whilst still cute, the title is a bit click-baity and not at all what I expected it to be about either.

EDIT: it seems though that the original post, shared via archive.org elsewhere in the thread does do this for what one would normally assume -- and does so by using `(1 x shift) !~` -- so if anyone else was confused, read the original post.


It's less interesting because decoding decimal numbers with regex is a much more impressive feat.


You can certainly do calculations with apples, but doing RSA encryption with them is just impractical.


Think about it a different way: the regex thing can only deal with unary numbers, like your computer can only deal with binary. It's really no different. I'm not sure if you can write a decimal-to-unary converter in regex but I reckon it's possible. At least using the non-regular regex like the original prime program did.


> validating whether a given number is a prime number or not ... It is validating whether a string of consecutive 1's has a prime length.

I don't think you've really explained why those two things are different. They're different methods but they're checking the exact same thing.


Well, no. The two things take input in two different forms, and one is substantially easier than the other.

It's a bit like if you claimed that a program that voice to text is the same as one that does text to text.


You're totally right; I apologise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: