If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.
I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.