I think Claude-3.7 is particularly guilty of this issue. If anyone from Anthropic is reading this, you might want to put your thumb on the scale so to speak the next time you train the model so it doesn't try to use special casing or outright force the test to pass