More

lambdaxyzw · on June 4, 2024

Me neither. I'm sending words of support, because I don't know why you're downvoted.

I assume parent's post is sarcasm, but I'm not 100% sure - I know there are people who hold similar views unironically.

lambdaxyzw · on June 4, 2024

What about resources needed to make the machine?

lambdaxyzw · on June 3, 2024

netcat+mkfifo does the actual transfer with TCP - so there's no tunneling involved. This solution tunnels tcp through file abstraction, so it works significantly differently.

As for the uses, see the three author listed. For example, by using a file share as transport you may evade firewall.

zamadatix · on June 3, 2024

  mkfifo /mnt/shared/connection
  nc localhost 12345 > /mnt/shared/connection < /mnt/shared/connection

Seems the same?

lambdaxyzw · on June 3, 2024

Is there a reason why you don't? Is it just general bitterness and cynicism? As far as I know all major search engines respect rebots.txt, I don't see why LLM scrappers would be different.

dylan604 · on June 3, 2024

probably, yes. But coming from LLM scrappers, I have absolutely no faith in any of them. When one of them calls themself "open" in their name and is anything but, why would I trust them for anything after they lie in their name?

I also do not trust Google only crawls what is allowed in robots.txt. Maybe they only use the data allowed in public use, but I have no faith that they don't have crawled data in their version of shadow profiles.

I do not trust bigTech at all, and for those that do, I really don't understand why you do.

lambdaxyzw · on June 3, 2024

>How can corporations be stealing anything from an open source project?

The code is published using some license that allows some use cases and prohibits other. For example GPL is famous for being viral. Using it to teach a LLM that spits "unlicensed" code is basically laundering copyright.

hiatus · on June 3, 2024

Using it to train an LLM seems orthogonal to the output of the LLM. For instance, they could have their LLM include a link to the license. Merely training an LLM on the data does not seem to be against the spirit of GPL or Apache license.

mrweasel · on June 3, 2024

Someone could easily create a such a license. Free to use and distribute, $10,000 per line used for AI model training.

I'll very naively assume that Amazon, OpenAI, Google and others check licenses before feeding data to their models. I'll stop assuming that when one of these companies admit that they don't actually care and it's not profitable for them to respect licenses.

barfbagginus · on June 5, 2024

To make that enforceable, it would be nice to prove the AI was trained on it.

You might insert a "sleeper/activator" pair. The sleeper is a watermark that the AI will recall verbatim. To make it provide the sleeper, we give the AI a special activator prompt.

Demonstrating that your public repo successfully poisoned the AI with a watermark could become a court admissible proof of unauthorized scraping.

carom · on June 3, 2024

The LLM is quite literally a derivative work of GPL code. At the very least, there is an argument in such a case that the derivative function (the model weights) should conform to the same license.

slindsey · on June 3, 2024

I've heard AI advocates talk about a "right to read" or "right to learn"; meaning that we have the right to read something and then internalize it and use it. Therefore, why shouldn't an AI have the same right? The difference to me seems to be that the AI has the ability to regurgitate it in whole.

I can read a book, learn about the concepts, then use or repeat those concepts. The AI can do the same. But is it really "learning"? It may be just spewing out pieces of the content without any understanding. In which case it's a copyright violation, right?

barfbagginus · on June 5, 2024

Let's assume that both humans and AI can produce statements that are new and useful, and can both produce statements that violate copyright. For example a human can operate an illegal a video tube website where they serve verbatim copies of copyrighted movies.

I'll argue that's not enough reason to grant the AI the right to learn from copyrighted materials, because the right to learn is intimately wrapped up in human needs, while AI rights are focused on corporate and societal needs, which are currently being decided.

The human right to learn

You're a human and you need the right to learn from copyrighted material in order to not suffer Ignorance, in order to serve Society, because it's not feasible to charge you a rent for ideas you get from a book, and because it would cause suffering and indignity if we tried to charge you for your own thoughts.

With an AI, it's less clear it needs the right to learn from copyrighted material, because it's not a person that can suffer, and because the scale of its usage of copyrighted materials - and its potential harm to copyright holders - is about 5 orders of magnitude greater than that of any single person, and is potentially greater than the collective impact of human learners.

Let's lay out the reasoning:

1. No AI Suffering (yet). The AI doesn't suffer from ignorance and isn't (yet) a real person. So it needs no personal right to learn.

2. Potential Social Harm. AI could pose a much greater threat to copyright holders than the sum total of all human learners. We'll be weighing this potential in court, and it's currently not clear how the matter will be decided. Copyright holders could be awarded protections against corporations training AIs.

3. Ease Of Accounting. AIs and their training materials can be audited, unlike a human mind. So we have a technical means to restrict the AI's ability to learn from copyrighted materials.

4. No Harm in Accounting. Since the AI is not yet a person, and suffers no indignity or invasion of privacy from being audited, it's safe to audit and regulate the AI's training materials.

In summary it's important to remember that human rights exist because humans need those rights to enjoy life in a dignified way as persons, and because those rights benefit Society.

When we decide the question of AI rights, it's important to remember it's not a person, and any rights it has will be provided on the basis of societal benefit alone. It's not yet clear which AI rights will benefit Society here. It's quite possible that we will strengthen copyrights against unlicensed AI use, at least to some degree beyond the current "free-for-all".

barfbagginus · on June 5, 2024

You need to do more than include a link to the license to comply. You need to include the entire source code needed to compile the derived system.

For an LLM that would include:

1. Training data

2. Training code and metrics

3. Hyperparameter settings

4. Output weights

Anything less is really just misinterpretation of the nature of open source's provision for studying, modifying, and recompiling the LLM

Tldr; these companies MUST make the LLM into AGPL and provide all necessary codes as described above. Companies that refuse this will be raided by open source copyright trolls, if we're lucky and a little mischievous.

lambdaxyzw · on June 2, 2024

I like that it immediately assumed the US, even though nothing in your question suggested it. I love that all LLMs have a strong US centric bias.

Btw I'm not personally a lawyer, but I've heard that GPT is especially prone to mixing laws across the borders - for example you ask a law question in language X, and get a response that uses a law from a country Y - and it's extremally convincing doing that (unless you're a lawyer, I guess).

LeoPanthera · on June 2, 2024

ChatGPT has user-customizable "instructions", and mine are set to tell it where I live. Any user can do the same, so that it will not make incorrect assumptions for you.

d3m0t3p · on June 2, 2024

You might increase the probability of getting a correct answer for your region, but imo you decrease your awareness to allucination. Overall you can still get a wrong answer

fjdjshsh · on June 2, 2024

This is my experience with Hackernews. If the comment doesn't specify the country, it's an American talking about the USA

LordDragonfang · on June 2, 2024

I mean, to be fair, if you're speaking English to it, the most likely possibility is that you're inside the US:

https://en.wikipedia.org/wiki/List_of_countries_by_English-s...

I know there's a lot of complaints about things being US-centric, but the US is a very large country.

ordersofmag · on June 2, 2024

Well, except the number of English speakers outside the US is much larger than inside the US (as per the wikipedia page you point to) by 5 to 1. Granted many folks are speaking it as their 2nd (or nth) language. But when you take into account the limited set of languages supported by ChatGPT one could reasonably assume English-speaking (typing) users of ChatGPT are from outside the U.S. as non-U.S. folks are in the majority of 'folks for whom english would be their first option when interacting with ChatGPT'. Even if you only count India, Nigeria and Pakistan.

Though of course OpenAI can tell (frequently, roughly) where folks are coming from geographically and could (does?) take that into account.

duskwuff · on June 3, 2024

> but the US is a very large country.

Indeed - the US is a very large country, and consists of over 50 different jurisdictions, each with their own slightly different laws. An answer to a legal question which is correct in one state will often be subtly incorrect in another, and completely wrong in yet another.

lambdaxyzw · on June 2, 2024

Would it be better if he got 100% of votes for him? He was the only candidate, nothing most people working there could go.

lambdaxyzw · on June 2, 2024

But there are safety concerns, and it's not possible to go 1000km/h on the highway. So there's not much point in having a 1000km/h car, because you will never use its full potential. Just like a 1khz display (supposedly).

modeless · on June 2, 2024

The analogy is strained and no longer applicable. Sure, you won't be running Cyberpunk 2077 at 1 kHz, but you'll use a 1 kHz display to its full potential of imperceptible latency and eliminated motion blur in regular desktop use, with every movement of your mouse and scroll of its wheel.

Even in gaming, less demanding games can hit 1000 Hz and console emulators will benefit from reduced latency too; you could actually beat CRT latency.

lambdaxyzw · on June 2, 2024

Well, maybe I'm biased - I've tested a 120Hz monitor recently and I legitimately didn't see a huge difference between 60Hz. Maybe in mouse trails when I was intentionally looking for the difference. I can't imagine how insignificant the change to 1Khz would be for me. But I'm not a gamer and I've spent my whole life on screens like this, so maybe I'm just blind and other people are much more sensitive to this - in this case I agree my analogy doesn't make sense.

lambdaxyzw · on June 2, 2024

I didn't get the impression from the blog post that island evacuation happens every few weeks. In fact, "first" suggest something opposite of that is true. Can you explain why you consider this even normal, obvious and happening regularly?

jazzyjackson · on June 2, 2024

You misread OP, it's not that it happens every few weeks, but watching the coastline can hint at the trend over geologic time of islands appearing and disappearing

I think with the intent of downplaying the rising-sea-levels angle, since islands can also erode or sink

lambdaxyzw · on June 2, 2024

>The only way forward is to not participate and/or burn it all down and start from scratch.

You can't honestly suggest that. From where? "village or small community" level?

zo1 · on June 2, 2024

It's not a perfect answer, and one can "not participate" in more ways than one. Also, it depends on the individual and what makes sense for them and their family.

It's also a case of "picking your poison". So if the recent war wasn't happening, I'd be moving straight to Russia as right now it seems to be the sanest when it comes to these specific things. But that comes with dealing with some of the "negatives" as I'm sure you all can imagine.

If money wasn't a problem, personally I'd fund some sort of island or floating-island community to promote self-sustainable practices whilst staying as far as practically possible from the craziness going on in the west.

Jensson · on June 2, 2024

You can still legally leave the EU. But this draconian law has not passed yet and has been up many times before, most likely it wont pass.