The reverse pseudo-Huber loss function
- 5. July 2024 (modified 6. July 2024)
- #mathematics
In this article we introduce the reverse pseudo-Huber function
which acts like the absolute value when \(|x|\) is around zero and like a square as \(|x|\) becomes large.
Introduction to pseudo-Huber loss
The pseudo-Huber function \(H(x) = \sqrt{x^2 + 1} - 1\) is a smooth approximation to the Huber loss. It acts like \(x^2/2\) near zero and like \(\lvert x \rvert\) near infinity. A typical use-case is robust regression, where we want to employ a squared loss but simultaneously limit the influence of outliers. Below we plot the pseudo-Huber function along with the two functions it approximates:
The equation for the pseudo-Huber and its derivative is
The reverse pseudo-Huber
What about the reverse case? What if we want a function that acts like \(\lvert x \rvert\) near \(x=0\) and like \(x^2/2\) near \(x = \pm \infty\)? I found no such function in the literature when searching, though it would not surprise me if someone has deduced it already.
I propose the following equation for the reverse pseudo-Huber and its derivative:
Here’s a plot of the reverse pseudo-Huber:
Deriving the reverse pseudo-Huber
To derive the reverse pseudo-Huber, I plotted the derivative \(H'(x)\) for \(x>0\). The function \(H'(x)\) acts like \(\lvert x \rvert\) near \(x=0\) and like \(1\) near \(x = \infty\).
Our goal is to find a function with the opposite behavoir—it should act like \(1\) near \(x=0\) and like \(\lvert x \rvert\) near \(x = \infty\). But this is exactly the behavoir of \(H(x)\) if we simply add one to it!
Therefore we guess that \(R(x) = \int_0^x \sqrt{\tau^2 + 1} \, d\tau\) would be a proposal for a function when \(x>0\). We mirror it across the vertical axis to obtain symmetry. It turns out that this guess is very good indeed!
- It has exactly the desired behavoir at \(x=0\) and \(x = \pm \infty\).
- The functions \(H(x)\) and \(R(x)\) act as tight lower and upper bounds on both the functions \(\lvert x \rvert\) and \(x^2/2\).
- The proposed \(R(x)\) has a similar functional form as \(H(x)\), since it’s more or less the integral.
Just like we can scale the pseudo-Huber as \(c H(x / c)\) for \(c > 0\), we can scale the reverse pseudo-Huber as \(c^2 R(x / c)\). The reason we scale with the square is that \(H(x)\) acts like \(\lvert x \rvert\), while \(R(x)\) acts like \(x^2 / 2\) as \(x \to \pm \infty\). To undo the scaling as \(x \to \pm \infty\) we need to multiply \(H(x)\) and \(R(x)\) by \(c\) and \(c^2\) respectively.
Other approximations to Huber loss
Two alternative loss functions that act like \(x^2/2\) near \(x=0\) and like \(\lvert x \rvert\) near \(x = \pm \infty\) are
I was unable to construct reverse functions for these two. Applying the same trick as above does not work if we want to avoid special functions, since the integral depends on the Dilogarithm function.
Thanks to my friend Floris for discussions on the contents of this article.