My guess is that the degradation of JSON capability happened recently? The gpt-4 API switched over to gpt-4-0613 (the function calling version) on June 27. And given the performance increase for ChatGPT Plus at the end of May, my guess is they started testing the new model (which is much faster) on web users around then. In my testing [1], the new version is:
a. Worse at general code-like tasks without using functions
b. Equivalent or better at code-like tasks if you use the function API
c. Much faster than the older model either way.
I'd guess it's cheaper to run, too, and that they use the presence of a function in the API signature to weight their mixture of experts differently (and cull some experts?). The degradation in general purpose coding tasks is pretty obvious and repeatable (try the same prompts in the Playground with the -0314 model vs the -0613!), but it does seem like you can regain that lost capability with the new function call API, and it's faster. The tradeoff is that you only regain the capability when it calls functions; you can't really have a mix of prose-and-code in the same response as easily, or at least not with the same quality.
a. Worse at general code-like tasks without using functions
b. Equivalent or better at code-like tasks if you use the function API
c. Much faster than the older model either way.
I'd guess it's cheaper to run, too, and that they use the presence of a function in the API signature to weight their mixture of experts differently (and cull some experts?). The degradation in general purpose coding tasks is pretty obvious and repeatable (try the same prompts in the Playground with the -0314 model vs the -0613!), but it does seem like you can regain that lost capability with the new function call API, and it's faster. The tradeoff is that you only regain the capability when it calls functions; you can't really have a mix of prose-and-code in the same response as easily, or at least not with the same quality.
1: https://twitter.com/reissbaker/status/1671361372092010497