Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.


No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.


I did not say there will be a discontinuity in the gradient; I said that the modified loss function will not have a mathematically well-defined derivative because of the discontinuity in the function.


Which is completely irrelevant to the point I was making




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: