I hear you, but I feel it's also important to differentiate between the kinds in...

		airstrike 3 days ago \| parent \| context \| favorite \| on: New benchmark shows top LLMs struggle in real ment... I hear you, but I feel it's also important to differentiate between the kinds in which humans and LLMs can be quote-unquote "bad". "Good" is too broad and subjective to be a useful metric.