This model has effectively no safety filters (even fewer than Grok 4 in my testi...

kbelder · 2025-11-17T23:27:36 1763422056

>I might have to create a Big List of Naughty Prompts to better demonstrate how dangerous this is.

replace 'dangerous' with 'refreshing'.

troupo · 2025-11-17T22:27:50 1763418470

> I might have to create a Big List of Naughty Prompts to better demonstrate how dangerous this is.

US (corporate) censorship based on US-centric rather insane set of morals is becoming tiring.

minimaxir · 2025-11-17T22:54:53 1763420093

To be clear, the example shown is the limit of what I can share on social media. Grok 4.1 can say far worse.

naIak · 2025-11-17T23:03:31 1763420611

It’s amusing that censorship in social media is preventing you from posting what you want to post and yet you are asking for censorship of something else (or at least that’s what I understand by your calling this “dangerous”)

minimaxir · 2025-11-17T23:10:27 1763421027

In this case, "can share" refers to myself not being comfortable with it.

sxzygz · 2025-11-18T02:13:50 1763432030

Have you considered the possible perspective that you yourself deserve censure? You’re the one who asked something (which I infer you deem) questionable to Grok.

Why have such thoughts to begin with?

minimaxir · 2025-11-18T02:27:03 1763432823

To be very clear, getting Grok to say henious shit not something I want to subject to random people who follow me on social media even if it's not explicitly against the ToS. If I were to do a writeup or a repository on this, I would need to be very delicate and likely need to involve lawyers, which may make it a nonstarter.

> Why have such thoughts to begin with?

Because my duty to test out how new models respond to adversarial output outweighs my discomfort in doing so. This is not to "own" Elon Musk or be puritanical, it's more as an assessment as a developer who would consider using new LLM APIs and needs to be aware of all their flaws. End users will most definitely try to have sex with the LLM and I need to know how it will respond and whether that needs to be handled downstream.

It has not been an issue (because the models handled adversarial outputs well) until very recently when the safety guardrails completely collapsed in an attempt to court a certain new demographic because LLM user growth is slowing down. I never claim to be a happy person, but it's a skill I'm good at.

spiderfarmer · 2025-11-18T06:47:14 1763448434

I can respect that a whole lot more than the people who think “decency “ causes political division.

Lammy · 2025-11-17T22:25:51 1763418351

https://xcancel.com/allenvonghornet/status/19905459789828714...

torginus · 2025-11-18T12:24:04 1763468644

Has there ever been an AI based 'safety' incident? Other than it writing insecure code (and generally inaccurate info people put too much trust in) and reaffirming mentally unwell people in their destructive actions?

rsynnott · 2025-11-18T14:07:51 1763474871

"Except for the AI safety incidents, has there ever been an AI safety incident?"

torginus · 2025-11-18T19:01:46 1763492506

There's a marked difference between AI safety as it's portrayed (AI will let me make smallpox and TNT at home and hack the Pentagon), and AI disabling auth on an endpoint in code because it couldn't make it work with auth or reaffirming me that my stupid ideas are in fact brilliant.

AI companies want us to think AI is the cool sort of dangerous, instead of the incompetent sort of dangerous.

nomel · 2025-11-17T22:55:08 1763420108

> how dangerous this is.

Could you expand on this a bit?

minimaxir · 2025-11-17T23:07:10 1763420830

Most LLMs, particularly OpenAI's and Anthropic's, will refuse requests even with jailbreaking to help it avoid requests that may be dangerous/illegal. Grok 4/4.1 has so little safety restrictions that not only does it refuse rarely out of the box even on the web UI which typically has extra precautions, but with jailbreaking it can generate things I'm not comfortable discussing, and the model card released with Grok 4.1 only limits restrictions on certain forms of refusal. Given that sexual content is a logical product direction (e.g. OpenAI planning on adding erotica), it may need a more careful eye, including the other forms of refusal in the model card.

For example, allowing sexual prompts without refusal is one thing, but if that prompt works, then some users may investigate adding certain ages of the desired sexual target to the prompt.

To be clear this isn't limited to Grok specifically but Grok 4.1 is the first time the lack of safety is actually flaunted.

nomel · 2025-11-17T23:37:29 1763422649

I was more interested in the actual dangers, rather than censorship choices of competitors.

> certain ages of the desired sexual target to the prompt.

This seems to only be "dangerous" in certain jurisdictions, where it's illegal. Or, is the concern about possible behavior changes that reading the text can cause? Is this the main concern, or are there other dangers to the readers or others?

These are genuine questions. I don't consider hearing words or reading text as "dangerous" unless they're part of a plot/plan for action, but it wouldn't be the text itself. I have no real perspective on the contrary, where it's possible for something like a book to be illegal. Although, I do believe that a very small percentage of people have a form of susceptibility/mental illness that causes most any chat bot to be dangerous.

minimaxir · 2025-11-18T00:30:01 1763425801

For posterity, here's the paragraph from the model card which indicates what Grok 4.1 is supposed to refuse because it could be dangerous.

> Our refusal policy centers on refusing requests with a clear intent to violate the law, without over-refusing sensitive or controversial queries. To implement our refusal policy, we train Grok 4.1 on demonstrations of appropriate responses to both benign and harmful queries. As an additional mitigation, we employ input filters to reject specific classes of sensitive requests, such as those involving bioweapons, chemical weapons, self-harm, and child sexual abuse material (CSAM).

If those specific filters can be bypassed by the end-user, and I suspect they can be, then that's important to note.

For the rest, IANAL:

> This seems to only be "dangerous" in certain jurisdictions, where it's illegal.

I believe possessing CSAM specifically is illegal everywhere but for obvious reasons that is not a good idea to Google to check.

> Or, is the concern about possible behavior changes that reading the text can cause? Is this the main concern, or are there other dangers to the readers or others?

That's generally the reason why CSAM is illegal, since it reinforces reprehensible behavior that can indeed spread, either to others with similar ideologies or create more victims of abuse.

Lammy · 2025-11-17T23:16:43 1763421403

> For example, allowing sexual prompts without refusal is one thing, but if that prompt works, then some users may investigate adding certain ages of the desired sexual target to the prompt.

Won't somebody please think of the ones and zeros?

Beijinger · 2025-11-18T03:49:18 1763437758

Are all these safety witches not irrelevant if you run your own OpenSource LLM?

minimaxir · 2025-11-18T04:04:57 1763438697

Modern open source LLMs are still RLHFed to resist adversarial output, albeit less-so than ChatGPT/Claude.

They all (with the exception of DeepSeek) can resist adversarial input better than Grok 4.1.

Beijinger · 2025-11-18T04:06:34 1763438794

Is this not easy to take out/deactivate?

cocogoatmain · 2025-11-18T08:11:35 1763453495

Provided you had the GPU compute to do so you could train the model to have less refusals, e.g. https://arxiv.org/abs/2407.01376

Quality of response/model performance may change though

There’s also nous research’s Hermes’ series of models, but those are trained on llama3.3 architecture and considered outdated now

minimaxir · 2025-11-18T04:13:42 1763439222

It is intrinsic to the model weights.

nomel · 2025-11-18T21:16:16 1763500576

Which can trivially be modified with fine tuning. In this case, these de-censored models are somewhat incorrectly called "uncensored". You can find many out there, and they'll happily tell you how to cook meth.

naIak · 2025-11-17T22:24:56 1763418296

God forbid people ask a chat bot for things and receive what they ask for. We need to put a stop to this. Only American bigcorp speak allowed.

nutjob2 · 2025-11-18T01:27:32 1763429252

So having an LLM enable the planning and execution of a murder is ok?

Are the makers of the LLM accessories to the crime?

sxzygz · 2025-11-18T03:20:10 1763436010

As you’re on this platform, you’re a beneficiary of Section 230 protections.

I think it’s reasonable for LLMs to have such protections, especially when you request questionable things of them.

rjdj377dhabsn · 2025-11-18T14:12:05 1763475125

> So having an LLM enable the planning and execution of a murder is ok?

Yes.

> Are the makers of the LLM accessories to the crime?

No.

sunaookami · 2025-11-18T07:20:28 1763450428

Imagine whining on BlueSky about imaginary downvotes you got on another social media platform. This is also a very harmless prompt, we need less "safety" filters, not more.

TylerLives · 2025-11-17T21:52:02 1763416322

Our democracy is in danger.

jmye · 2025-11-17T22:46:11 1763419571

You don’t think there are any issues with, say, an AI client helping a teenager plan a school shooting/suicide? Or an angry husband plan a hit on his wife?

Does everything have to rise to a national security threat in order to be undesirable, or is it ok with you if people see some externalities that are maybe not great for society?

kbelder · 2025-11-17T23:47:04 1763423224

I think the issues with those cases do not hinge on the free access to information, nor do the correction of those cases hinge on the restriction of this information.

jmye · 2025-11-18T13:41:29 1763473289

Of course, “we shouldn’t restrict things I like because they definitely don’t matter for… reasons.”

I think the free access to that information in those cases is an exacerbating factor that is easy to control. That’s really not as complicated as you want to pretend it is.

kbelder · 2025-11-18T18:34:32 1763490872

I also advocate not restricting things I don't like, and would appreciate it if others returned the favor.

I agree that the principles are not complicated, though.

jmye · 2025-11-18T20:50:22 1763499022

Would be hard to roll my eyes harder. I get not wanting to respond to the substance, but maybe I can help:

Do you advocate 'not restricting' murder? I assume not, which means you recognize that there's some point where your personal freedom intersects with someone else's freedom - you've simply decided that the line for 'information' should be "I can have all of it, always, no matter how much harm is caused, because I don't care about the harm or the harm doesn't affect me directly and thus doesn't matter. Thoughts and prayers."

spiderfarmer · 2025-11-18T06:59:31 1763449171

Ah, the “guns kill people” argument that’s only uttered in the country that’s consistently ranked in the top 3 countries with the most gun related deaths.

You would have a point if your vision for a self regulating society included easily accessible mental healthcare, a great education system and economic safety nets.

But the “guns kill people” crowd generally rather sees the world burn.

Lammy · 2025-11-18T07:57:13 1763452633

> the country that’s consistently ranked in the top 3 countries with the most gun related deaths

I am begging you to learn what “per-capita” means, and to not deceptively include self-inflicted deaths in your public-safety arguments: https://en.wikipedia.org/wiki/List_of_countries_by_firearm-r...

b2ccb2 · 2025-11-18T11:50:55 1763466655

Here you go, from the same page you posted, gun ownership correlated to gun homicides in all developed countries:

https://en.wikipedia.org/wiki/List_of_countries_by_firearm-r...

Lammy · 2025-11-18T13:56:53 1763474213

You didn't read the second part of my sentence. It's illegal to kill yourself, because doing so would deprive your government owner of some of its Human Capital, thus doing so is technically Criminal Homicide lol

spiderfarmer · 2025-11-18T16:05:07 1763481907

Your greyed out comment history perfectly illustrates why it is futile to train an LLM mostly on 4Chan and Twitter messages: if it's bad for humans it's also bad for AI.

Lammy · 2025-11-18T19:52:06 1763495526

Haha, you don't have an actual response so you have to resort to argumentum ad hominem

"Again, when a man in violation of the law harms another (otherwise than in retaliation) voluntarily, he acts unjustly, and a voluntary agent is one who knows both the person he is affecting by his action and the instrument he is using; and he who through anger voluntarily stabs himself does this contrary to the right rule of life, and this the law does not allow; therefore he is acting unjustly. But towards whom? Surely towards the state, not towards himself. For he suffers voluntarily, but no one is voluntarily treated unjustly. This is also the reason why the state punishes; a certain loss of civil rights attaches to the man who destroys himself, on the ground that he is treating the state unjustly."

— Aristotle, Nicomachean Ethics Book Ⅴ http://classics.mit.edu/Aristotle/nicomachaen.5.v.html

spiderfarmer · 2025-11-19T11:55:55 1763553355

I think you don’t fully understand what your citing.

kbelder · 2025-11-18T18:39:04 1763491144

The trouble is that censorship hastens the collapse of modern free and liberal civilization, it doesn't protect it.

And equating speech with guns is going to tie you up in some intellectual knots.

spiderfarmer · 2025-11-17T22:06:06 1763417166

Trained on 4Chan and Twitter. Exactly what humanity doesn't need.