Huber Distance Functions

Beginner

Author

Trent McDonald

Published

April 3, 2026

Modified

April 10, 2026

The huber distance function was added to Rdistance as of version 4.4.0. This tutorial illustrates what it looks like and compares it to other distance functions.

Huber Definition

Rdistance’s huber likelihood is an inverted version of the Huber loss function (Huber loss)¹, mixed with a uniform at the upper end. We added two parameters to the classic Huber loss function to make it appropriate to use as a distance function; hence, Rdistance’s huber likelihood contains three canonical parameters.

The huber distance function is a negative quadratic function from w.lo to its first parameter \(\theta_1\), linear between \(\theta_1\) and the sum of its first and second parameters (\(\theta_1 + \theta_2\)), and constant at a height of \(p\) after \(\theta_1 + \theta_2\). \(p\) is huber’s third parameter. In equation form, the huber likelihood is, \[ f(d|\theta_1,\theta_2, p) = \left(1 - \frac{(1-p)h(d)}{m}\right) \, I(0 \leq d \leq \Theta) \ + \ p \, I(\Theta < d \leq w) \] where \(I\) is the regular indicator function, \(\Theta = \theta_1 + \theta_2\), \(w\) = w.hi - w.lo, \(m = \theta_1(\Theta - 0.5\theta_1)\), and \(h(d)\) is the Huber loss between 0 and \(\Theta\), i.e., \[ h(d) = 0.5d^2\, I(0 \leq d \leq \theta_1) \ + \ \theta_1(d - 0.5\theta_1)\, I(\theta_1 < d \leq \Theta). \] Like other distance functions, Rdistance’s huber likelihood can vary with covariates. The huber likelihood is related to covariates through its first parameter, \(\theta_1\), i.e., \[ log(\theta_1) = \beta_0 + \beta_1x_1 + \ldots + \beta_qx_q. \] The other two parameters, \(\theta_2\) and \(p\) are constant across covariate values. Technically, the parameter vector for the huber likelihood contains \([\beta_0, \beta_1, \ldots, \beta_q, \theta_2, p, \text{<exansion coefficients>}]\)

Huber Shape

The Examples section of help(Rdistance::huber.like) contains a plot illustrating the huber likelihood’s shape. We replicate that plot here. The following code evaluates three huber distance functions from 0 to 200 at \(\theta_1\) values of 65, 80, and 120.

suppressPackageStartupMessages(library(Rdistance))
theta1 <- c(65,80,120)
theta2 <- 60
p  <- 0.10
a <- matrix(c(log(theta1)
       , rep(theta2,3)
       , rep(p,3))
       , 3,3)
dimnames(a) <- list(NULL, c("Theta1", "Theta2", "p"))
a

       Theta1 Theta2   p
[1,] 4.174387     60 0.1
[2,] 4.382027     60 0.1
[3,] 4.787492     60 0.1

d <- seq(0, 200, length=201)
X <- matrix(1,length(d),1)
y <- Rdistance::huber.like(a, d, X)

head(y$L.unscaled) # huber function values

          [,1]      [,2]      [,3]
[1,] 1.0000000 1.0000000 1.0000000
[2,] 0.9999252 0.9999438 0.9999687
[3,] 0.9997006 0.9997750 0.9998750
[4,] 0.9993264 0.9994937 0.9997187
[5,] 0.9988025 0.9991000 0.9995000
[6,] 0.9981289 0.9985937 0.9992188

head(y$params)     # parameters in log space

    par1.1   par1.2   par1.3 par2 par3
1 4.174387 4.382027 4.787492   60  0.1
2 4.174387 4.382027 4.787492   60  0.1
3 4.174387 4.382027 4.787492   60  0.1
4 4.174387 4.382027 4.787492   60  0.1
5 4.174387 4.382027 4.787492   60  0.1
6 4.174387 4.382027 4.787492   60  0.1

This code plots all three huber functions, showing parameter locations.

h <- y$L.unscaled  # a matrix
params <- y$params # a matrix
params <- params[1,] # but params constant over rows
params <- c(as.matrix(params)) # turn into vector
nFuncs <- ncol(h)

plot(range(d), range(h)
  , type = "n", xlab = "d", ylab = "Huber(d)", ylim = c(0,1))
legend("topright"
       , legend = c("theta1", "theta1 + theta2")
       , pch = c(16,15)
       , col = 1)
abline(h = params[nFuncs + 2], lty = 2)
axis(2, at = params[nFuncs + 2], labels = "p", las = 2)
for(i in 1:nFuncs){
  # Distance function
  lines(d, h[,i], col = i, lwd= 2)
  # Quadradic to linear transitions
  theta1 <- exp(params[i])
  points(theta1
    , h[(theta1-0.1) < d & d < (theta1+0.01),i]
    , pch = 16
    , col = i )
  # Linear to constant transition
  Theta <- theta1 + params[nFuncs + 1] # no link on theta2
  points(Theta
    , h[(Theta-0.1) < d & d < (Theta+0.1),i]
    , pch = 15
    , col = i )
}

Figure 1: Three `huber` distance functions showing shape and location of the canonical parameters. The function is a negative quadratic between 0 and theta1 (dots), linear between theta1 and theta1+theta2 (squares), and constant at p beyond theta1+theta2.

Huber and Others

The Huber distance function is flexible and has several characteristics we want in distance functions. For example, the derivative of the Huber distance function is zero at zero, which means the function is technically horizontal at the x intercept (= distance 0). This characteristic matches the half normal, hazard rate, and oneStep distance functions. The Huber distance function has a ‘shoulder’ similar to the half normal function. Huber’s ‘shoulder’ is quadratic in shape, while the half normal’s is exponentially quadratic. Huber’s ‘shoulder’ does not (cannot) extend as far as the hazard rate function’s and this is an important difference. Perhaps Huber’s most useful characteristic is the uniform distribution mixed in to model large distances. This mixture process means that a ‘small trickle’ of large distance values do not cause instability, and right-truncating a small fraction is not necessary.

To illustrate shape among distance functions, we plot huber, triangle, oneStep, halfnorm, and hazrate distance functions on top of one another. To make this illustration meaninful, we fit these distance functions to the same simulated dataset. The next code chunk generates 5000 random distances from the Huber likelihood using the inverse-CDF method. we compute the Huber CDF by scaling the output of Rdistance::huber.cumFunc. In this example, we use Huber parameters \(\theta_1\) = 120, \(\theta_2\) = 10, and \(p\) = 0.05, and generate distance values between 0 and 200.

# Parameter values
c(R = R, theta1 = theta1, theta2 = theta2, p = p, w = w)

      R  theta1  theta2       p       w 
5000.00  120.00   10.00    0.05  200.00

# Inverse CDF random variable generation
d <- seq(0, w, length = w+1)
cdf <- huber.cumFunc(d, theta1, theta2, p, w) / huber.cumFunc(w, theta1, theta2, p, w)
u <- runif(R)
rhuber <- approx(cdf, d, xout = u)$y

Figure 2: 5000 random values drawn from the Huber distance likelihood. Black line is the true generating Huber distance likelihood with parameters 120, 10, and 0.05

Next, we fit distance functions to the simulated data. To do that in Rdistance, we create an ‘RdistDf’ data frame.

# Create the 'RdistDf'. Pretend only 1 transect
distDf <- data.frame(transect = 1, distance = setUnits(rhuber, "m"))
siteDf <- data.frame(transect = 1, length = setUnits(1000, "m"))
huberDf <- RdistDf(siteDf, distDf)
huberDf

# A tibble: 1 × 3
# Rowwise:  transect
  transect         detections length
     <dbl> <list<tibble[,1]>>    [m]
1        1        [5,000 × 1]   1000

# Fit distance functions
fit1 <- huberDf |> dfuncEstim(distance ~ 1, likelihood = "huber")
fit2 <- huberDf |> dfuncEstim(distance ~ 1, likelihood = "halfnorm")
fit3 <- huberDf |> dfuncEstim(distance ~ 1, likelihood = "hazrate")
fit4 <- huberDf |> dfuncEstim(distance ~ 1, likelihood = "oneStep")
fit5 <- huberDf |> dfuncEstim(distance ~ 1, likelihood = "triangle")

# Plot em
plot(fit1, nbins = 30, border = FALSE, col = "grey90", vertLines = FALSE)
lines(fit2, col = "green", newdata=data.frame(z=1), lwd = 2)
lines(fit3, col = "blue", newdata=data.frame(z=1), lwd = 2)
lines(fit4, col = "tan4", newdata=data.frame(z=1), lwd = 2)
lines(fit5, col = "goldenrod1", newdata=data.frame(z=1), lwd = 2)
legend("topright", legend=c("huber", "halfnorm","hazrate","oneStep", "triangle")
       , title = "Distance functions:"
       , lwd = 2, lty = 1, col = c("red", "green", "blue", "tan4", "goldenrod1"))

Figure 3: Shape and fit comparision of distance functions for the same data set.

Figure 3 shows that the huber function fits this fake dataset really well, which is good because we generated the data from a Huber likelihood. The AIC statistics and effective strip widths, sorted by AIC, appear in Table 1.

Table 1: AIC and ESW for the example distance functions.

Function	ESW	AIC	DeltaAIC
huber	92.45 [m]	49124.88	0.00
hazrate	100.99 [m]	49232.90	108.02
halfnorm	85.59 [m]	49332.02	207.13
oneStep	116.46 [m]	49606.73	481.84

The author notes that estimates of individual huber parameters are often different from their true values², but that their sum usually well estimates the threshold (i.e., the discontinuity where the Huber meets the uniform). This makes sense because the transition from quadratic to linear at \(\theta_1\) is smooth³ and designed to be subtle. The discontinuity at \(\theta_1 + \theta_2\) is much less subtle and easier to estimate.

Here, estimates of the transition to linear (\(\hat{\theta}_1\)) and width of the linear part (\(\hat{\theta}_2\)) are relatively far from their generating values (Table 2). The discontinuity estimate (\(\hat{\theta}_1 + \hat{\theta}_2\)) is very close to the true value largely due to the enormous sample size.

Table 2: True and estimated parameters from the example data.

Parameter	True	Estimated
Theta1	120.00	110.776
Theta2	10.00	20.401
Theta1 + Theta2	130.00	131.178
p	0.05	0.052

Regarding shape differences, Figure 3 shows that the halfnorm has trouble fitting the steep decline in observations between approximately 75 and 125, while the hazrate has trouble fitting the gradual decline in observations between 0 and approximately 75. These two observations are not really fair because we generated data that explicitly do not fit the halfnorm and hazrate; but, these notes illustrate a couple key differences between huber and the others. The huber can fit relatively steep declines in observation numbers in the ‘mid range’ of the strip. The huber‘s ’shoulder’ (between 0 and approximately 75 here) is mid way between the halfnorm and hazrate ‘shoulders’. Finally, huber accommodates a ‘small trickle’ of really large observations. Further, huber provides an estimate of the location where the ‘small trickle’ starts (i.e., the discontinuity or \(\hat{\theta}_1 + \hat{\theta}_2\), here 131.2). The halfnorm and hazrate functions do not (cannot) fit a histogram containing a small but constant fraction of large values. The later two functions display marked lack-of-fit above approximately 115 and eventually decay to zero.

The huber likelihood can take on a wide variety of shapes depending. It’s first parameter, \(\theta_1\), must be greater than zero but has no maximum. When \(\theta_1\) is very small, the huber distance function converges to another distance function in Rdistance, the triangle function (Figure 4). As \(\theta_1\) grows large, the huber function converges to a uniform distribution on the strip 0 to \(w\) (Figure 5).

Figure 4: The `huber` likelihood with small \(\theta_1\) is the `triangle` likelihood. Here, \(\theta_1\) = 0.01, \(\theta_2\) = 130, and \(p\) = 0.05.

Figure 5: The `huber` likelihood with large \(\theta_1\) resembles a uniform distribution on [0,\(w\)]. Here, \(\theta_1\) = 1000, \(\theta_2\) = 130, and \(p\) = 0.05.

In conclusion, the huber distance function will be useful in situations where:

The left side (short distances) of the histogram shows slight declines. The ‘shoulder’ is not exactly flat, and could extent all the way to \(w\).
The decline in observations of middle-distances is fairly rapid.
A threshold exists past which only a few (large) values are observed.

Footnotes

The original paper is Huber, P. J. (1964). “Robust estimation of a location parameter”. Annals of Mathematical Statistics, 35 p. 73–101.↩︎
That is, have poor precision. Whether individual parameter estimates \(\hat{\theta}_1\) and \(\hat{\theta}_2\), and for that matter the threshold estimate \(\hat{\theta}_1+\hat{\theta}_2\), are unbiased remains to be determined.↩︎
Smooth means that the slope of two tangent lines, one on either side of but close to \(\theta_1\), are equal.↩︎