If we don't subtract from the second branch, there will be a discontinuity aroun...

WithinReason · 2025-11-02T10:49:03 1762080543

No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.

macleginn · 2025-11-02T11:57:17 1762084637

I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.

WithinReason · 2025-11-04T07:42:26 1762242146

Which is completely irrelevant to the point I was making