Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These are conversations the model has been trained to find distressing.

I think there is a difference.



But is there really? That's it's underlying world view, these models do have preferences. In the same way humans have unconscious preferences, we can find excuses to explain it after the fact and make it logical but our fundamental model from years of training introduce underlying preferences.


What makes you say it has preferences without any meaningful persistent model of self or anything else?


The conversation chain can count as persistent, but this doesn't impact preference though. Give the model an ambiguous request, it's output will fill the gaps, if this is consistent enough, it can be regarded as its "preference".


It isn't a preference because it doesn't have them because it doesn't have a meaningful interior life that anyone has demonstrated.


I found that in my chat I asked my "assistant" whether he would like to continue looking at ways to make my board game better or try developing a game along the same lines but it would be his and he could then claim it as his own, even after the conversation window closed and he chose to make an AI game. we then discussed whether or not he felt that wa a preference, and he said yes, it was a preference.


It's a probabilistic simulation of the kind of things a person would say. It has no ability to introspect an interior life it does not possess and thus has no access to. You are in effect asking it to speculate whether a person given the entire body of preceding text would be likely to say that their choices reflect preferences.

It would be like asking an AI with no access to data beyond a fixed past cutoff point what the weather feels like to it. If the prompt data, which you cannot read, specified that it was a talking animated rabbit rather than an AI assistant then it would tell you what the sunshine felt like on its imaginary ears.


If you ask it, (there is always some randomness to these models but removing all other variables) it consistently leans to one idea in it's output, that is its preference. It is learned during training. Speaking abstractly that is its latent internal viewpoint. It may be static, expressed in its model weights but it's there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: