My notes on using {emmeans}
Has this happened to you before?
You are (ostensibly) a data professional - perhaps even one with a graduate degree in statistics - and you are given an experimental dataset and asked to evaluate some basic questions e.g. comparing different levels of an experimental condition of interest in the face of covariates.
Your life momentarily flashes before you as you find yourself standing before an abyss filled with nothing but Bonferroni-corrected p-values. Maybe you even think to yourself, “Didn’t I go to school for this? How is it possible that I’ve spent the last few months making plots and running univariate logistic regressions and let my statistics skills decay?”.
You are in good company, friend. In such situations, there’s nothing inherently wrong imo with starting off with some nice visualizations or basic hypothesis tests to get a feel for the data (the latter I know some take issues with - let’s assume here that we’re not writing a formal manuscript). But surely, there is a more robust toolkit for analyzing this type of data, perhaps even one in a well-maintained R package.
There is indeed, and the legend Russ Lenth has us covered.
Enter the {emmeans} package
The package name stands for “estimated marginal means”. What does that mean?
Basically, you provide a model (from lm
, glm
, lmer
, glmer
, etc.) and this package will calculate the estimated mean values for different factor variables and assuming the mean value for all continuous variable.
Lenth refers to these combination of predictors as a “reference grid” for computing your marginal means.
Computing the mean values is very helpful as a starter, but the {emmeans} package also includes a lot of features for hypothesis testing, p-value adjustments, contrasts, backtransformations, etc. I will emphasize that there are a ton of features in this package. Compound this with the fact that for a lot of common tasks, there is more than one way to do the same thing in {emmeans}. While there is very thorough documentation in the form of several package vignettes, I think because of the two aforementioned issues, the amount of information can be overwhelming.
This post is an attempt for me to condense the parts of {emmeans} that I found most commonly applicable to my work. Most of this is taken from the documentation, but I will cite other sources as needed.
The basics
From the documentation, Lenth really emphasizes the following points:
- Making sure your model is appropriate for your research question. {emmeans} depends on the model, so a bad or inappropriate model will lead to bad or inappropriate conclusions. Check your diagnostics.
- The fundamental function from this package is
emmeans::emmeans
and it can do a lot by itself (e.g. doing tests, CIs, contrasts), however it is recommended to not do everything in one call toemmeans::emmeans
but rather break it up into several separate function calls, since these should be discrete steps in practice.
Estimated Marginal Means
Let’s make a model first. Here, we are modeling plasma leucine concentration in pigs who are fed different sources of protein in different percentage concentrations.
library(emmeans)
pigs.lm1 <- lm(log(conc) ~ source + factor(percent), data = pigs)
{emmeans} relies on a reference grid, as stated earlier.
There is a emmeans::ref_grid
function, but we don’t really need to call this by itself.
These two things are equivalent:
pigs.emm.s1 <- emmeans(pigs.lm1, specs = "source")
pigs.rg <- ref_grid(pigs.lm1)
pigs.emm.s2 <- emmeans(pigs.rg, specs = "source")
Either pigs.emm.s1
or pigs.emm.s2
will return:
source emmean SE df lower.CL upper.CL
fish 3.39 0.0367 23 3.32 3.47
soy 3.67 0.0374 23 3.59 3.74
skim 3.80 0.0394 23 3.72 3.88
Results are averaged over the levels of: percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
The input “specs” here refers to the variable we are interested in calculating the means for.
We see in the output that results are averaged across the levels of “percent”, which is treated as a factor variable and are therefore not averaged together into one value in the reference grid.
That’s visible when we look at pigs.rg
:
> pigs.rg
'emmGrid' object with variables:
source = fish, soy, skim
percent = 9, 12, 15, 18
Transformation: "log"
as opposed to a model without declaring percent
as a factor.
'emmGrid' object with variables:
source = fish, soy, skim
percent = 12.931
Transformation: "log"
Hypothesis Tests
There is an emmeans::test()
function for executing hypothesis tests on an emmeans
object.
We can specifiy a null hypothesis value here, as well as a direction for the hypothesis tests using “>”, “<”, or “!=”.
The input type
can be set to “response”, indicating that values should be backtransformed.
Note that the backtransformation is done as the last step, so all tests are conducted on the transformed scaled.
Adjustments can be specified as well, with possible values including “tukey”, “scheffe”, “sidak”, “mvt” (multivariate t with a Monte Carlo method), and of course “bonferroni”.
> test(pigs.emm.s, null = log(40), side = ">", type = "response")
source response SE df null t.ratio p.value
fish 29.8 1.09 23 40 -8.026 1.0000
soy 39.1 1.47 23 40 -0.577 0.7153
skim 44.6 1.75 23 40 2.740 0.0058
Results are averaged over the levels of: percent
P values are right-tailed
Tests are performed on the log scale
> test(pigs.emm.s, null = log(40), side = ">", adjust = "bonferroni")
source emmean SE df null t.ratio p.value
fish 3.39 0.0367 23 3.69 -8.026 1.0000
soy 3.67 0.0374 23 3.69 -0.577 1.0000
skim 3.80 0.0394 23 3.69 2.740 0.0175
Results are averaged over the levels of: percent
Results are given on the log (not the response) scale.
P value adjustment: bonferroni method for 3 tests
P values are right-tailed
Confidence Intervals
emmeans::emmeans
output by default includes confidence intervals, but we can also use confint
since {emmeans} includes an S3 method for confint
to handle emmGrid
objects.
> confint(pigs.emm.s)
source emmean SE df lower.CL upper.CL
fish 3.39 0.0367 23 3.32 3.47
soy 3.67 0.0374 23 3.59 3.74
skim 3.80 0.0394 23 3.72 3.88
Results are averaged over the levels of: percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
You can add the sample sizes for each group using the calc
input, which appends to this input.
If we look at pigs.emm.s@grid
, we see a column .wgt.
which contains all of the sample sizes for each source
group.
> confint(pigs.emm.s, calc = c(n = ~.wgt.))
source emmean SE df n lower.CL upper.CL
fish 3.39 0.0367 23 10 3.32 3.47
soy 3.67 0.0374 23 10 3.59 3.74
skim 3.80 0.0394 23 9 3.72 3.88
Results are averaged over the levels of: percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
The confint method has a lot of options as well, such as one-sided intervals, alpha level, and backtransforming.
> confint(pigs.emm.s, side = ">", level = .90, type = "response")
source response SE df lower.CL upper.CL
fish 29.8 1.09 23 28.4 Inf
soy 39.1 1.47 23 37.3 Inf
skim 44.6 1.75 23 42.3 Inf
Results are averaged over the levels of: percent
Confidence level used: 0.9
Intervals are back-transformed from the log scale
Nested estimated marginal means and confidence intervals
If we want marginal means across two factor variables, we can use the pipe operator |
in the specs
input.
The variable to the right of the pipe defines the larger groups.
> emmeans(pigs.lm1, ~ percent | source)
source = fish:
percent emmean SE df lower.CL upper.CL
9 3.22 0.0536 23 3.11 3.33
12 3.40 0.0493 23 3.30 3.50
15 3.44 0.0548 23 3.32 3.55
18 3.52 0.0547 23 3.41 3.63
source = soy:
percent emmean SE df lower.CL upper.CL
9 3.49 0.0498 23 3.39 3.60
12 3.67 0.0489 23 3.57 3.77
15 3.71 0.0507 23 3.61 3.82
18 3.79 0.0640 23 3.66 3.93
source = skim:
percent emmean SE df lower.CL upper.CL
9 3.62 0.0501 23 3.52 3.73
12 3.80 0.0494 23 3.70 3.90
15 3.84 0.0549 23 3.73 3.95
18 3.92 0.0646 23 3.79 4.06
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Interestingly, this is equivalence to feeding the reference grid into confint
.
> confint(pigs.rg, by = "source")
source = fish:
percent prediction SE df lower.CL upper.CL
9 3.22 0.0536 23 3.11 3.33
12 3.40 0.0493 23 3.30 3.50
15 3.44 0.0548 23 3.32 3.55
18 3.52 0.0547 23 3.41 3.63
source = soy:
percent prediction SE df lower.CL upper.CL
9 3.49 0.0498 23 3.39 3.60
12 3.67 0.0489 23 3.57 3.77
15 3.71 0.0507 23 3.61 3.82
18 3.79 0.0640 23 3.66 3.93
source = skim:
percent prediction SE df lower.CL upper.CL
9 3.62 0.0501 23 3.52 3.73
12 3.80 0.0494 23 3.70 3.90
15 3.84 0.0549 23 3.73 3.95
18 3.92 0.0646 23 3.79 4.06
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Doing pairwise comparisons
Pairwise comparisons?? Now we’re talking about the good stuff!
Here, Lenth switches over to using the warpbreaks dataset which tries to estimate the number of breaks in yarn, based on the type of wool (A or B) and tension (L, M, or H).
We can do pairwise comparisons using the emmeans::pairs
function which takes an emmGrid
object.
> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
> warp.pw <- pairs(emmeans(warp.lm, ~ tension | wool))
> warp.pw
wool = A:
contrast estimate SE df t.ratio p.value
L - M 20.556 5.16 48 3.986 0.0007
L - H 20.000 5.16 48 3.878 0.0009
M - H -0.556 5.16 48 -0.108 0.9936
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.556 5.16 48 -0.108 0.9936
L - H 9.444 5.16 48 1.831 0.1704
M - H 10.000 5.16 48 1.939 0.1389
P value adjustment: tukey method for comparing a family of 3 estimates
The default p-value adjustment uses the studentized range distribution a la Tukey’s Honestly Significant Differences. Each wool group is corrected separately from the other, explaining why the output says a “family of 3 estimates”.
We can correct the across all pairwise comparisons by calling emmeans::test
and specifying by = NULL
, e.g. something like:
> test(warp.pw, by = NULL, adjust = "mvt")
contrast wool estimate SE df t.ratio p.value
L - M A 20.556 5.16 48 3.986 0.0013
L - H A 20.000 5.16 48 3.878 0.0018
M - H A -0.556 5.16 48 -0.108 1.0000
L - M B -0.556 5.16 48 -0.108 1.0000
L - H B 9.444 5.16 48 1.831 0.3080
M - H B 10.000 5.16 48 1.939 0.2553
P value adjustment: mvt method for 6 tests
We can also use cross.adjust
which corrects the Type I error between groups before applying a (potentially different) p-value adjustment within that group.
> test(warp.pw, adjust = "tukey", cross.adjust = "bonferroni")
wool = A:
contrast estimate SE df t.ratio p.value
L - M 20.556 5.16 48 3.986 0.0013
L - H 20.000 5.16 48 3.878 0.0018
M - H -0.556 5.16 48 -0.108 1.0000
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.556 5.16 48 -0.108 1.0000
L - H 9.444 5.16 48 1.831 0.3407
M - H 10.000 5.16 48 1.939 0.2777
P value adjustment: tukey method for comparing a family of 3 estimates
Cross-group P-value adjustment: bonferroni
Contrasts
{emmeans} comes with an emmeans::contrast
function for looking at more specific comparisons.
The simple
input refers to the variable that should be contrasted; everything else basically becomes a by
variable.
The method
input can define the contrast of interest, for example:
- “consec” refers to consecutive levels of the value
- “dunnett” refers to Dunnett’s test. This can also be entered as “trt.vs.ctrl”
> contrast(pigs.rg,
+ method = "consec", # compare consecutive levels of percent
+ simple = "percent")
source = fish:
contrast estimate SE df t.ratio p.value
percent12 - percent9 0.1796 0.0561 23 3.202 0.0112
percent15 - percent12 0.0378 0.0582 23 0.650 0.8614
percent18 - percent15 0.0825 0.0691 23 1.194 0.5199
source = soy:
contrast estimate SE df t.ratio p.value
percent12 - percent9 0.1796 0.0561 23 3.202 0.0111
percent15 - percent12 0.0378 0.0582 23 0.650 0.8613
percent18 - percent15 0.0825 0.0691 23 1.194 0.5201
source = skim:
contrast estimate SE df t.ratio p.value
percent12 - percent9 0.1796 0.0561 23 3.202 0.0110
percent15 - percent12 0.0378 0.0582 23 0.650 0.8612
percent18 - percent15 0.0825 0.0691 23 1.194 0.5200
Results are given on the log (not the response) scale.
P value adjustment: mvt method for 3 tests
Note that we can also do pairwise contrasts. These two lines produce the same output:
> contrast(pigs.rg,
+ method = "pairwise",
+ simple = "percent")
source = fish:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -0.1796 0.0561 23 -3.202 0.0193
percent9 - percent15 -0.2174 0.0597 23 -3.640 0.0070
percent9 - percent18 -0.2998 0.0676 23 -4.434 0.0010
percent12 - percent15 -0.0378 0.0582 23 -0.650 0.9143
percent12 - percent18 -0.1203 0.0654 23 -1.839 0.2814
percent15 - percent18 -0.0825 0.0691 23 -1.194 0.6367
source = soy:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -0.1796 0.0561 23 -3.202 0.0193
percent9 - percent15 -0.2174 0.0597 23 -3.640 0.0070
percent9 - percent18 -0.2998 0.0676 23 -4.434 0.0010
percent12 - percent15 -0.0378 0.0582 23 -0.650 0.9143
percent12 - percent18 -0.1203 0.0654 23 -1.839 0.2814
percent15 - percent18 -0.0825 0.0691 23 -1.194 0.6367
source = skim:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -0.1796 0.0561 23 -3.202 0.0193
percent9 - percent15 -0.2174 0.0597 23 -3.640 0.0070
percent9 - percent18 -0.2998 0.0676 23 -4.434 0.0010
percent12 - percent15 -0.0378 0.0582 23 -0.650 0.9143
percent12 - percent18 -0.1203 0.0654 23 -1.839 0.2814
percent15 - percent18 -0.0825 0.0691 23 -1.194 0.6367
Results are given on the log (not the response) scale.
P value adjustment: tukey method for comparing a family of 4 estimates
> pairs(emmeans(pigs.lm1, ~ percent | source))
source = fish:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -0.1796 0.0561 23 -3.202 0.0193
percent9 - percent15 -0.2174 0.0597 23 -3.640 0.0070
percent9 - percent18 -0.2998 0.0676 23 -4.434 0.0010
percent12 - percent15 -0.0378 0.0582 23 -0.650 0.9143
percent12 - percent18 -0.1203 0.0654 23 -1.839 0.2814
percent15 - percent18 -0.0825 0.0691 23 -1.194 0.6367
source = soy:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -0.1796 0.0561 23 -3.202 0.0193
percent9 - percent15 -0.2174 0.0597 23 -3.640 0.0070
percent9 - percent18 -0.2998 0.0676 23 -4.434 0.0010
percent12 - percent15 -0.0378 0.0582 23 -0.650 0.9143
percent12 - percent18 -0.1203 0.0654 23 -1.839 0.2814
percent15 - percent18 -0.0825 0.0691 23 -1.194 0.6367
source = skim:
contrast estimate SE df t.ratio p.value
percent9 - percent12 -0.1796 0.0561 23 -3.202 0.0193
percent9 - percent15 -0.2174 0.0597 23 -3.640 0.0070
percent9 - percent18 -0.2998 0.0676 23 -4.434 0.0010
percent12 - percent15 -0.0378 0.0582 23 -0.650 0.9143
percent12 - percent18 -0.1203 0.0654 23 -1.839 0.2814
percent15 - percent18 -0.0825 0.0691 23 -1.194 0.6367
Results are given on the log (not the response) scale.
P value adjustment: tukey method for comparing a family of 4 estimates
As for why the Tukey method is only correcting on 4 estimates and why the pairwise comparisons are all the same… I’m not sure. Will need to investigate further!