Continuing the unofficial ongoing series of me learning things I should have learned in school but for realsies this time, I just had a really refreshing dive in to the tempestuous ocean of random effect calculations. If you happen upon this blog, I invite you to take a dip!

# What is a random effect (and why would we use one)?

As has been famously pointed out by numerous statistical resources (see here for some details), the definition of which covariates are random and which are fixed varies depending upon what dusty tome you’re reading. However, I find the following interpretation intuitive, and therefore, useful:

• If the levels of the covariate cover your entire population, it’s fixed
• If the levels of the covariate are a sample from a possibly infinite population, it’s random

In order to answer that question, we need some insight into what is actually happening in your computer when you specify a covariate to be random.

# Enter… some math

For the purposes of this journey, let’s restrict our discussion to random effect intercepts.

Let’s define our mixed effects model.

where the distribution of $b$ our random effects and $\epsilon$ are defined

Gelman and Hill is an excellent resource for mixed effects modeling, and they provide the following mathematical formula for random intercept estimates.

where the big matrix $\Sigma$ is defined as

We can already notice a few things here, without even jumping into R. First of all, the estimator for the random intercepts depends on only these things:

• $\psi$, the variance of the random intercept
• $\sigma^2$, the residual variance
• $(y - X\beta)$, the residuals after only accounting for the fixed effects

This means that after fitting the fixed effect part of the model, we can estimate all the random intecepts using just $\psi$! Random intercepts are just the residuals of the fixed effects, weighted and normalized by the ratio of the variances between the random effect and the residuals. Imagine that if we treated this covariate as fixed, we would have to estimate these intercepts directly - particularly problematicif this is a factor with a lot of levels! In addition to shrinking our intercept estimates, the decision to use a random effect has saved us degrees of freedom and simplified our model.

# R

Let’s do a concrete example in R. The cake dataset has 3 cake recipes being baked at 6 temperatures, with 15 replicates at each combination of recipe and temperature. Consider the following mixed effects model with one fixed covariate (temp) and one random covariate (replicate).

The summary output of this mixed effect model is displayed below:

Linear mixed model fit by REML ['lmerMod']
Formula: angle ~ temp + (1 | replicate)
Data: cake

REML criterion at convergence: 1671.7

Scaled residuals:
Min       1Q   Median       3Q      Max
-2.83605 -0.56741 -0.02306  0.54519  2.95841

Random effects:
Groups    Name        Variance Std.Dev.
replicate (Intercept) 39.19    6.260
Residual              23.51    4.849
Number of obs: 270, groups:  replicate, 15

Fixed effects:
Estimate Std. Error t value
(Intercept)  0.51587    3.82650   0.135
temp         0.15803    0.01728   9.146

Correlation of Fixed Effects:
(Intr)
temp -0.903


If we look at the estimated random intercepts from this model, we would see the following:

Now, let’s try to calculate this step-by-step instead of having ranef do all the heavy lifting. The trickiest part is figuring out how to extract the design matrices corresponding to the random effect: $Z$ and $Z^t$.

Luckily, R has a built in function to extract components from a merMod object called getME. This function takes a merMod as its first argument, and a string specifying the component to be extracted as its second argument. Two of those arguments are exactly what we need: “Z” and “Zt”!

To demonstrate that this approach works:

That’s actually the hardest part! Now we just need $\psi$, $\sigma^2$, and $y - X\beta$.

As we saw above in summary(m), there is a table labeled “Random effects” which has the estimated variance for all random effects and the residual error. From this table, we see that $\psi = 39.19$ and $\sigma^2 = 23.51$.

Finally, we can calculate $y - X\beta$ directly or we can use predict(m, re.form = NA). The re.form = NA option is needed here to enforce that predictions are calculated only using the fixed effects. Note that if you just call predict(m), the random intercepts will be included in the prediction calculations which will lead to results that are the opposite sign of what you would expect.

This gives us the final step!

Matches up well with the results from ranef(m)!