In this tutorial, we’re going to take our analysis skills to the next level. One of the coolest and most important aspects of linear models (in my opinion!) isn’t just the p-values you get for each predictor. Don’t get me wrong—those are important and tell you where there is an effect. But after knowing where the effect is, the exciting part is trying to understand what our effect actually is. In other words, how can it be described?
Be ready!! We’ll explore how to extract and interpret predictions from your model, before learning how to transform these predictions into meaningful statistical insights that tell the complete story of your eyetracking data.
Model
First let’s re-run our model. I’ll be brief here as you should already be familiar with this. Before diving in, we import relevant libraries, read the data, set categorical variables as factors, standardize the continuous variables, and finally run our model. One difference between this model and our previous tutorial’s model is that we’ve included an additional interaction with SES (Socio-Economic Status). This variable is a factor with 3 levels: high, medium, and low. This more complex model will be useful to fully experience the model’s predictions and estimates.
library(lme4)library(lmerTest)library(easystats)library(tidyverse)df=read.csv("../../resources/Stats/Dataset.csv")df$Id=factor(df$Id)# make sure subject_id is a factordf$SES=factor(df$SES)# make sure subject_id is a factordf$Event=factor(df$Event)# make sure subject_id is a factordf$StandardTrialN=standardize(df$TrialN)# create standardize trial columnmod=lmer(LookingTime~StandardTrialN*Event*SES+(1+StandardTrialN|Id), data=df)parameters(mod)
Now that we have our model up and running, let’s ask ourselves a seemingly simple question that will guide our entire tutorial: What are the expected looking times for Rewarding versus NoRewarding events according to our model?
Looking at the model output above, you might be scratching your head. With all those coefficients, interactions, and reference levels, calculating these values manually would require adding up multiple estimates correctly - a tedious and error-prone process. You’d need to carefully sum the intercept with various coefficient estimates, accounting for interactions and reference levels.
There must be a simpler way to extract this information, right?
Absolutely! This is where PREDICTIONS come to our rescue.
But.. what are predictions?? They’re essentially what our model “thinks” should happen under certain conditions!
The main idea is that:
We collect our data - This includes all our eye-tracking measurements with their natural variability and noise
We fit our model - We create a mathematical equation that best captures the relationship between our variables
We use this equation to make predictions - This is the powerful part! We can now:
predict values for conditions we’ve tested in our experiment
predict values for conditions that fall within the range of our data but weren’t explicitly tested
Sounds a little confusing… but really cool right?? Let’s dive in, everything will be clearer!!
Note
In this tutorial, we will rely heavily on the modelbased package functions. Modelbased is part of the easystats package collection that we love and have already introduced!!
The modelbased package provides lots of documentation and vignettes about its functions. We strongly suggest you visit the package website if you want to learn more tricks about model estimates.
Here we won’t have the time to cover all the amazing possibilities of the package, but we will try our best to give you an introduction to the process.
We also want to mention that modelbased is not the only option for this kind of estimates. There are different packages like emmeans and marginaleffects (these two are actually called in the background by modelbased).
What Are Predictions, Anyway?
Now we can extract the predictions from our model. We want to see what our model predicts the values of Looking time would be for each row of our dataframe. Here is where the Easystats package (specifically the modelbased sub-package) comes to the rescue with the function estimate_expectation().
Look!! The function gave us a new dataframe with all the levels of Event and of StandardTrialN and ID that we had in our dataframe. OK, OK, I will agree….this looks so similar to the original dataframe we had…so why even bother???? Can’t we just use the original dataframe?? Well because the predictions represent what our model thinks the data should look like based on the patterns it discovered, after accounting for random effects and noise!
Let’s visualize the difference between the raw data and the predictions
As you can see, the raw data and the predicted data are very similar but not the same!! The predictions represent our model’s “best guess” at what the looking times should be based on the fixed effects (StandardTrialN, Event and SES) and random effects (individual participant differences). This smooths out some of the noise in the original data and gives us a simpler picture of the underlying patterns!
This is cool indeed… but how can this help me?? Let’s remember our question about Events….
Predictions for reference values
These predictions are fascinating, but how do they help us answer our original question about Event types? Let’s pull ourselves back to our main goal - understanding the difference in looking times between Rewarding and NoRewarding events.
With our prediction tools in hand, we can now directly extract the specific information we need without getting lost in coefficient calculations or being overwhelmed by the full dataset. The beauty of predictions is that they transform complex statistical relationships into concrete, interpretable values that directly address our research questions.
This is where the powerful by argument comes into play. It allows us to create focused predictions for specific variables of interest, making our analysis more targeted and interpretable.
This gives us predictions for each level of the Event and SES factor while:
Setting StandardTrialN to its mean value
SES is set to “high”. This is because the function can’t meaningfully “average” a categorical variable like SES, so it uses its reference level (“high” in our case) for making predictions.
This is already better!!! LOOK we have 1 value for each Event level!! Amazing
Tip
In these examples (and the next ones), we explored a very simple case. We just passed the predictors we were interested in, and the function gave us all the levels of that predictor. However, we can also specify exactly which level we want instead of having the function automatically return all the ones it thinks are relevant. There are different ways to do this - the simplest is:
estimate_expectation(mod, by =c("Event = 'NoReward'", "SES"))
Warning: Some factors in the data have only one level. Cannot compute model
matrix for standard errors and confidence intervals.
Model-based Predictions
Event | SES | Predicted
-----------------------------
NoReward | high | 1302.88
NoReward | low | 1345.96
NoReward | medium | 1211.81
Variable predicted: LookingTime
Predictors modulated: Event = 'NoReward', SES
Predictors controlled: StandardTrialN (-0.17)
What we have done for the Event can also be done for a continuos vairable!! When we pass a continuous variable to the by argument, the function automatically extracts a range of values across that predictor. We can control how many values to sample using the length parameter. For example, setting length=3 will give us predictions at just three points along the range of our continuous variable, while a larger value would provide a more detailed sampling.
estimate_expectation(mod, by ='StandardTrialN', length=3)
This creates predictions for all combinations of Event levels and StandardTrialN values - perfect for visualizing interaction effects!
Tip
In this tutorial, we focused on the estimate_expectation() function. However, modelbased offers additional functions that return similar things but with slightly different defaults and behaviors.
For example, if we run estimate_relation(mod), instead of returning predictions based on the values in our original dataframe, it will automatically return predictions on a reference grid of all predictors. This is equivalent to running estimate_expectation(mod, by = c('StandardTrialN','Event')) that we saw before.
These functions accomplish very similar tasks but may be more convenient depending on what you’re trying to do. The different function names reflect their slightly different default behaviors, saving you time when you need different types of predictions. Don’t be intimidated by these differences - explore them further at the modelbased website to learn them better.
Plot
All these predictions allow for simple direct plotting. These are ggplot objects that can be customized by adding other arguments (like we are doing here where we are changing the theme ). Note that there’s a limit to how well these automatic plots handle complexity - with too many predictors, you might need to create custom plots instead.
Est_plot=estimate_expectation(mod, by =c('StandardTrialN','Event'), length=10)Est_plot
In this plot, we can see the predicted values for looking time for the two Events changing over the StandardTrialN. The lines represent our model’s best estimates, while the shaded areas show the confidence intervals around these estimates.
Important
In this plot the predictions are extracted with the SES (socioeconomic status) being held at its high level! In a model with multiple predictors, predictions are always made with the other predictors fixed at specific values (typically their mean or reference level). Always be aware of what values other variables are being held at when interpreting your plots.
BUT….
OK, I hope by this point you have a solid grasp of predictions and how to extract them at specific predictor levels! But wait - did you notice something interesting in the messages that modelbased has been giving us?
The values we’ve extracted for the Event variable are always referenced to one specific level of SES. For example, when we look at the effect of Reward vs. NoReward events, we’re seeing this effect specifically when SES is at its high level
But what if that’s not what we want? What if we want to know the effect of Event overall, not just at a specific level of SES? What if we want to understand the “main effect” of Event, averaging across all SES levels? Is it even possible to average a categorical predictor??
This is where estimate_means() comes to the rescue! 💪
Estimate means
As its name suggests, estimate_means() allows you to extract estimated means from your model. But what does this actually mean? (Yes, lots of “means” in this explanation!)
This function:
Generates predictions for all the reference levels of a model’s predictors
Averages these predictions to keep only the predictors we are interested in
Think of it as a streamlined way to answer the question: “What is the average looking time for each level of my factor, according to my model?” Let’s see it in action!!
Estimated Marginal Means
Event | Mean | SE | 95% CI | t(644)
--------------------------------------------------------
NoReward | 1286.88 | 70.16 | [1149.11, 1424.66] | 18.34
Reward | 1411.22 | 70.12 | [1273.54, 1548.90] | 20.13
Variable predicted: LookingTime
Predictors modulated: Event
Predictors averaged: StandardTrialN (-0.17), SES, Id
Notice the information message! This approach is fundamentally different from what we did before. Previously, with estimate_expectation(), we were setting SES to its reference level (“high”). Now instead, the function extracts predictions for all levels of SES and then averages across them! These results represent the effect of Event type averaged across all SES conditions, giving you a more comprehensive view of how Event influences looking behavior.
THIS IS WHAT WE WANTED!!! 🎉🎉🎉
Important
One key difference between estimate_expectation() and estimate_means() is how they handle the random effect structure. By default, estimate_means() averages over all levels of the random effects, as noted in the function’s message. This gives us an average effect of the predictor for an average subject - exactly what we need when trying to understand overall patterns in our data. To learn more about these nuances, check the comprehensive vignettein the modelbased package.
Plot
The estimate_means() function also comes with its own built-in plotting capability, automatically generating a clean ggplot visualization of your estimated means.
This creates a simple yet informative ggplot showing our estimated means with confidence intervals. Of course, we can also use the dataframe that the function returns to create more customized and refined plots if needed.
Continuous
As we have seen before for predictions, we can also pass a continuous predictor to our by= argument. In this case, estimate_means() gives us the averaged values across the range of the continuous predictor.
estimate_means(mod, by ='StandardTrialN', length=6)
We’ve successfully extracted the estimated means for our predictors - awesome job! 🎉
But there’s something important missing. While we now know the estimated looking times for each level, we don’t yet have formal statistical comparisons between these levels. For example, is the difference between mean values in Reward condition and NoReward condition statistically significant?
This is where estimate_contrasts() comes to the rescue! Let’s see it in action:
Marginal Contrasts Analysis
Level1 | Level2 | Difference | SE | 95% CI | t(644) | p
--------------------------------------------------------------------------
Reward | NoReward | 124.33 | 4.34 | [115.82, 132.85] | 28.66 | < .001
Variable predicted: LookingTime
Predictors contrasted: Event
Predictors averaged: StandardTrialN (-0.17), SES, Id
p-values are uncorrected.
What we’re doing here is simple but powerful:
We pass our model to the function
We specify which predictor we want to compare (Event)
The function returns the difference between the means of the predictors with statistical tests
The results show us that the difference between Reward and NoReward is positive (meaning looking time is higher for Reward) and the p-value indicates this difference is statistically significant! All of this in just one line of code - how convenient!
Note
In this example, Event has only 2 levels (Reward and NoReward), so we already knew these levels differed significantly from examining our model summary. However, estimate_contrast() gives us something more valuable: the pairwise difference between all the levels of the selected predictors with their
This function becomes particularly powerful when dealing with predictors that have multiple levels. With a predictor like SES (3 levels) - estimate_contrast() would automatically calculate all possible pairwise comparisons between these levels.
Remember that if you have multiple levels in predictors you will need to adjust the p-value for multiple comparisons. You can do that by passing the argument p_adjust = 'hochberg' to the function. With this the contrasts will be adjusted for multiple comparisons. There are different adjustment methods - check the function arguments.
Marginal Contrasts Analysis
Level1 | Level2 | Difference | SE | 95% CI | t(644) | p
--------------------------------------------------------------------------
low | high | 47.27 | 147.98 | [-243.30, 337.85] | 0.32 | 0.749
medium | high | -87.67 | 165.80 | [-413.24, 237.89] | -0.53 | 0.749
medium | low | -134.95 | 197.68 | [-523.13, 253.24] | -0.68 | 0.749
Variable predicted: LookingTime
Predictors contrasted: SES
Predictors averaged: StandardTrialN (-0.17), Event, Id
p-value adjustment method: Hochberg (1988)
See?? We have all the comparison between the available level of SES
Plot
estimate_contrast() has a plotting function as well! However, it’s slightly more complex as the plot function requires both our estimated contrasts and means. If we pass both, the plot will show which level is different from the other using lighthouse plots - visualizations that highlight significant differences between groups.
While estimate_contrast() is usually very useful for checking differences between levels of a categorical variable, it can also be used to estimate contrasts for continuous variables. However, where in our opinion the function really shines is when examining interactions between categorical and continuous predictors like we have in our model!!!
ContrastbyTrial=estimate_contrasts(mod, contrast ='Event', by ='StandardTrialN', p_adjust ='hochberg')ContrastbyTrial
While we’ve been using contrast to specify one predictor and by for another, estimate_contrasts() is more flexible than we’ve shown. The contrast = argument can accept multiple predictors, calculating contrasts between all combinations of their levels. You can also mix and match with multiple predictors in contrast = while still using the by = argument to see how these combined contrasts change across levels of another variable. While this generates many comparisons at once (which isn’t always desirable), it can be valuable for exploring complex interaction patterns in your data.
The word is your contrast!
This gives us the contrast between the levels of Event for a set of values of StandardTrialN. So we can check whether the difference between Reward and NoReward actually changes over the course of trials!! I’ll plot it to make it simpler to understand:
This plot shows that the two levels are always significantly different from each other (the confidence intervals never touch the dashed zero line) and that the difference is always positive - looking time for Reward is consistently higher than for NoReward across all trial numbers. SUUPER COOL EH!!! I agree!!
Contrast of slopes
OK now we know the difference between levels of Event and also how this difference may change over time… the last step? We could check whether the slopes of StandardTrialN is different between the two conditions!
To do so, we can again use the estimate_contrast() function but inverting the arguments we used last time. So StandardTrialN moves to the contrast argument and Event goes to the by argument. Super easy:
estimate_contrasts(mod, contrast ='StandardTrialN', by ='Event')
Marginal Contrasts Analysis
Level1 | Level2 | Difference | SE | 95% CI | t | p
----------------------------------------------------------------------
Reward | NoReward | 24.06 | 4.53 | [15.18, 32.94] | 5.31 | < .001
Variable predicted: LookingTime
Predictors contrasted: StandardTrialN
Predictors averaged: StandardTrialN (-0.17), SES, Id
p-values are uncorrected.
This shows us that the effect of StandardTrialN is actually different between the two levels. This is something we already knew from the model summary (remember the significant interaction term?), but this approach gives us the precise difference between the two slopes while averaging over all other possible effects. This becomes particularly valuable when working with more complex models that have multiple predictors and interactions.
Estimate slope
I know we’ve covered a lot of ground already, but there’s one more important function worth introducing: estimate_slope(). While estimate_means() allows us to extract the means of predictions, estimate_slope() is designed specifically for understanding rates of change.
As the name suggests, this function calculates the change in your outcome variable for each unit increase in your predictor. For continuous predictors like our StandardTrialN, this helps us understand how quickly looking time changes across trials.
estimate_slopes(mod, trend ='StandardTrialN')
Estimated Marginal Effects
Slope | SE | 95% CI | t | p
-------------------------------------------------
-50.37 | 15.34 | [-80.43, -20.30] | -3.28 | 0.001
Marginal effects estimated for StandardTrialN
Type of slope was dY/dX
This tells us that the average effect of StandardTrialN is -50.46 (negative) and it is significantly different from 0. This means the looking time doesn’t remain constant across trials but significantly decreases as trials progress.
Important
You may be thinking… “Wait, isn’t this information already in the model summary? Why do I need estimate_slopes()?” If you are asking yourself this question, think back to the Linear models tutorial!!
What the model summary shows: The coefficient for StandardTrialN in the summary (-67.118) represents the slope specifically for the NoReward condition. This is because in linear models, coefficients are in relation to the intercept, which is where all predictors are 0. For categorical variables, 0 means the reference level (NoReward in this case).
What estimate_slopes() shows: The value from estimate_slopes() (-50.46) gives us the average slope across both Event types (Reward and NoReward).
This is why the two values don’t match! One is the slope for a specific condition, while the other is the average slope across all conditions. Super COOL right??
We can also check if there are differences in slopes
Factor x Continuous
Similar to the estimate_contrast function, estimate_slopes really shines when exploring interactions between continuous and categorical variables.
Remember! Our model indicated an interaction between StandardTrialN and Event, which means the rate of change (slope) should differ between the two Event types. However, from just knowing there’s an interaction, we can’t determine exactly what these slopes are doing. One condition might show a steeper decline than the other, one might be flat while the other decreases, or they might even go in opposite directions. The model just tells us they’re different, not how they’re different (or at least it is not so simple)
To visualize exactly how these slopes differ, we can include Event with the by argument:
Slopes=estimate_slopes(mod, trend ='StandardTrialN', by ='Event')Slopes
Estimated Marginal Effects
Event | Slope | SE | 95% CI | t | p
--------------------------------------------------------------
NoReward | -63.64 | 18.86 | [-100.60, -26.68] | -3.38 | < .001
Reward | -39.58 | 18.62 | [ -76.08, -3.08] | -2.13 | 0.034
Marginal effects estimated for StandardTrialN
Type of slope was dY/dX
Now we get the average effect for both Events. So we see that both of the lines significantly decrease!
Here we see that the slopes for both Event types are negative (below zero) and statistically significant (confidence intervals don’t cross the dashed zero line). This confirms that looking time reliably decreases as trials progress, regardless of whether it’s a Reward or NoReward event.
Complex plotting
We’ve covered a lot in this tutorial, and I know you might be getting tired! But let me add one final important skill: creating custom visualizations of your model predictions.
As we’ve seen throughout this tutorial, most functions in the modelbased package include handy plotting capabilities that generate nice ggplots. These built-in plots are great for quick visualization and are easily customizable.
But… there is always a but… sometimes you’ll need more control, especially when you want to combine multiple effects in a single visualization, or your model has complex interactions.
When built-in plotting functions aren’t enough, we need to generate our own visualizations. But what should we plot? The raw data? Of course not! As you’ve learned throughout this tutorial… we should plot PREDICTIONS! While estimate_relation() is useful, I want to show you an even more flexible approach by breaking down how estimate_relation() works. This way, you’ll be prepared for any visualization challenge, even with the most complex models.
Datagrid
The first step in custom plotting is creating the right reference grid for your predictions. We’ll use the get_datagrid() function, which takes your model as its first argument and the by = parameter to specify which predictors you want in your visualization.
Let’s extract a reference grid for our StandardTrialN * Event interaction:
get_datagrid(mod, by =c('Event', 'StandardTrialN'))
Visualisation Grid
Event | StandardTrialN | SES | Id
-------------------------------------
NoReward | -1.65 | high | 0
NoReward | -1.28 | high | 0
NoReward | -0.92 | high | 0
NoReward | -0.55 | high | 0
NoReward | -0.18 | high | 0
NoReward | 0.18 | high | 0
NoReward | 0.55 | high | 0
NoReward | 0.92 | high | 0
NoReward | 1.28 | high | 0
NoReward | 1.65 | high | 0
Reward | -1.65 | high | 0
Reward | -1.28 | high | 0
Reward | -0.92 | high | 0
Reward | -0.55 | high | 0
Reward | -0.18 | high | 0
Reward | 0.18 | high | 0
Reward | 0.55 | high | 0
Reward | 0.92 | high | 0
Reward | 1.28 | high | 0
Reward | 1.65 | high | 0
Maintained constant: SES
Looking at the output, you’ll see we get both levels of Event (Reward and NoReward) and a range of values for StandardTrialN (10 values by default spanning the range of our data). Notice that the Id column is set to 0. This is how the function indicates that random effects aren’t included—we’re focusing on the fixed effects that represent the “average response” across all subjects.
Caution
Different models declare non-level for the random effects in different ways, but datagrid will adapt to the specific model you are running. Just do not be scared if the values are not 0.
The datagrid is where we can really flex our customization muscles. Instead of accepting the default values, we can request specific levels of a factor, custom ranges for continuous variables, or statistical landmarks of continuous variables.
For example, instead of the default range of values, we could request 'StandardTrialN = [quartiles]' which would give us the lower-hinge, median, and upper-hinge of the continuous variable. Cool, right? There are many functions and statistical aspects we can extract - check the documentation of get_datagrid() for the full range of possibilities.
Let’s create a more sophisticated datagrid to demonstrate:
Grid=get_datagrid(mod, by =c('Event', 'StandardTrialN = [fivenum]'))head(as.data.frame(Grid), n=20)
Event StandardTrialN SES Id
1 NoReward -1.646 high 0
2 NoReward -0.953 high 0
3 NoReward -0.260 high 0
4 NoReward 0.607 high 0
5 NoReward 1.646 high 0
6 Reward -1.646 high 0
7 Reward -0.953 high 0
8 Reward -0.260 high 0
9 Reward 0.607 high 0
10 Reward 1.646 high 0
What we’ve done here is request both levels of Event, the five-number summary for StandardTrialN (minimum, lower-hinge, median, upper-hinge, maximum), and all different subjects by setting include_random = TRUE. Why would we want such a complex grid? Well, first things first to make it complex…… and also we want to see what is happening for each subject!! Let’s now use this grid……
Prediction on grid
Now that we have our grid, we can pass it to the get_predicted() function. This will give us all the predictions on the grid!!
get_predicted(mod, Grid, ci =T)
Predicted values:
[1] 1404.685 1357.018 1309.351 1249.716 1178.250 1484.729 1455.429 1426.129
[9] 1389.472 1345.543
NOTE: Confidence intervals, if available, are stored as attributes and can be accessed using `as.data.frame()` on this output.
As you notice, this gives us a vector of predictions, but we also need the confidence intervals to make a proper plot. As the message mentions, those are ‘hidden’ and need to be accessed using as.data.frame().
Now we have our dataframe we can do whatever we want with it. Let’s plot it:
ggplot(db, aes(x =StandardTrialN, y =Predicted, color =Event, fill =Event))+geom_ribbon(aes(ymin =Predicted-SE, ymax =Predicted+SE), alpha =.4)+geom_line(lwd =1.2)+theme_classic(base_size =20)+theme(legend.position ='bottom')
Well, you did it! You’ve survived the wild jungle of model-based estimation! By breaking down the prediction process into customizable steps, you’ve gained complete control over how to extract and visualize the patterns in your eye-tracking data. Whether you use the convenient built-in functions or create custom plots from scratch, you now have the tools to answer sophisticated research questions and present your findings beautifully. Happy modeling!
---title: "ModelEstimates"author: "Tommaso Ghilardi"date: "23/03/2025"description-meta: "Learn how to extract multiple information from your models. From estimate means, contrast and slopes!"keywords-meta: "R, lm, Linear models, statistics, analysis, psychology, tutorial, experiment, DevStart, developmental science"categories: - Stats - R - linear models---Welcome back, stats adventurer! 👋 If you are here, it means you have already gone through our tutorial on: [Linear models](LinearModels.qmd), [Linear mixed effect models](LinearMixedModels.qmd) and [Generalized linear models](GeneralisedModels.qmd). Well done!! You are nearly there, my young stats padawan. {style="margin-right:10px;margin-bottom:10px;" width="144"}In this tutorial, we're going to take our analysis skills to the next level. One of the coolest and most important aspects of linear models (in my opinion!) isn't just the p-values you get for each predictor. Don't get me wrong—those are important and tell you where there is an effect. But after knowing *where* the effect is, the exciting part is trying to understand *what* our effect actually is. In other words, how can it be described?Be ready!! We'll explore how to extract and interpret predictions from your model, before learning how to transform these predictions into meaningful statistical insights that tell the complete story of your eyetracking data.# ModelFirst let's re-run our model. I'll be brief here as you should already be familiar with this. Before diving in, we import relevant libraries, read the data, set categorical variables as factors, standardize the continuous variables, and finally run our model. **One difference between this model and our previous tutorial's model** is that we've included an additional interaction with SES (Socio-Economic Status). This variable is a factor with 3 levels: high, medium, and low. This more complex model will be useful to fully experience the model's predictions and estimates.```{r}#| label: Run model#| echo: true#| message: false#| warning: falselibrary(lme4)library(lmerTest)library(easystats)library(tidyverse)df =read.csv("../../resources/Stats/Dataset.csv")df$Id =factor(df$Id) # make sure subject_id is a factordf$SES =factor(df$SES) # make sure subject_id is a factordf$Event =factor(df$Event) # make sure subject_id is a factordf$StandardTrialN =standardize(df$TrialN) # create standardize trial columnmod =lmer(LookingTime ~ StandardTrialN * Event * SES + (1+ StandardTrialN | Id ), data= df)parameters(mod)```# Beyond p-values: Understanding Your EffectsNow that we have our model up and running, let's ask ourselves a seemingly simple question that will guide our entire tutorial: What are the expected looking times for Rewarding versus NoRewarding events according to our model?Looking at the model output above, you might be scratching your head. With all those coefficients, interactions, and reference levels, calculating these values manually would require adding up multiple estimates correctly - a tedious and error-prone process. You'd need to carefully sum the intercept with various coefficient estimates, accounting for interactions and reference levels.There must be a simpler way to extract this information, right?Absolutely! This is where **PREDICTIONS** come to our rescue.But.. what are predictions?? They're essentially what our model "thinks" should happen under certain conditions!The main idea is that:1. **We collect our data** - This includes all our eye-tracking measurements with their natural variability and noise2. **We fit our model** - We create a mathematical equation that best captures the relationship between our variables3. **We use this equation to make predictions** - This is the powerful part! We can now: - predict values for conditions we've tested in our experiment - predict values for conditions that fall within the range of our data but weren't explicitly testedSounds a little confusing... but really cool right?? Let's dive in, everything will be clearer!!::: callout-noteIn this tutorial, we will rely heavily on the modelbased package functions. Modelbased is part of the easystats package collection that we love and have already introduced!!The modelbased package provides lots of documentation and vignettes about its functions. We strongly suggest you visit the package website if you want to learn more tricks about model estimates.Here we won't have the time to cover all the amazing possibilities of the package, but we will try our best to give you an introduction to the process.We also want to mention that modelbased is not the only option for this kind of estimates. There are different packages like emmeans and marginaleffects (these two are actually called in the background by modelbased).:::## What Are Predictions, Anyway?Now we can extract the predictions from our model. We want to see what our model predicts the values of Looking time would be for each row of our dataframe. Here is where the [**Easystats**](https://easystats.github.io/easystats/) package (specifically the [**modelbased**](https://easystats.github.io/modelbased/index.html) sub-package) comes to the rescue with the function `estimate_expectation()`.```{r}#| label: Predictions#| message: false#| warning: falsePred =estimate_expectation(mod)head(Pred)```Look!! The function gave us a new dataframe with all the levels of Event and of StandardTrialN and ID that we had in our dataframe. OK, OK, I will agree....this looks so similar to the original dataframe we had...so why even bother???? Can't we just use the original dataframe?? Well because the predictions represent what our model thinks the data *should* look like based on the patterns it discovered, after accounting for random effects and noise!Let's visualize the difference between the raw data and the predictions```{r}#| label: PlotRawPredicted#| echo: false#| message: false#| warning: false#| fig-height: 5.5#| fig-width: 7ggplot() +geom_point(data = Pred, aes(x = StandardTrialN, y = Predicted, color ="Predicted"), size =3, alpha =0.5) +geom_point(data = df, aes(x = StandardTrialN, y = LookingTime, color ="Observed"), size =1) +scale_color_manual(values =c("Predicted"="#018080", "Observed"="black")) +labs(y ="Looking time", x ="Standardized trial number", color ="") +theme_classic(base_size =20)+theme(legend.position ='bottom')```As you can see, the raw data and the predicted data are very similar but not the same!! The predictions represent our model's "best guess" at what the looking times should be based on the fixed effects (StandardTrialN, Event and SES) and random effects (individual participant differences). This smooths out some of the noise in the original data and gives us a simpler picture of the underlying patterns!This is cool indeed... but how can this help me?? Let's remember our question about Events....## Predictions for reference valuesThese predictions are fascinating, but how do they help us answer our original question about Event types? Let's pull ourselves back to our main goal - understanding the difference in looking times between Rewarding and NoRewarding events.With our prediction tools in hand, we can now directly extract the specific information we need without getting lost in coefficient calculations or being overwhelmed by the full dataset. The beauty of predictions is that they transform complex statistical relationships into concrete, interpretable values that directly address our research questions.This is where the powerful `by` argument comes into play. It allows us to create focused predictions for specific variables of interest, making our analysis more targeted and interpretable.### Factors```{r}#| message: true#| warning: trueestimate_expectation(mod, by ='Event')```This gives us predictions for each level of the Event and SES factor while:- Setting StandardTrialN to its mean value- SES is set to "high". This is because the function can't meaningfully "average" a categorical variable like SES, so it uses its reference level ("high" in our case) for making predictions.This is already better!!! LOOK we have 1 value for each Event level!! Amazing::: callout-tipIn these examples (and the next ones), we explored a very simple case. We just passed the predictors we were interested in, and the function gave us all the levels of that predictor. However, we can also specify exactly which level we want instead of having the function automatically return all the ones it thinks are relevant. There are different ways to do this - the simplest is:```{r}#| message: falseestimate_expectation(mod, by =c("Event = 'NoReward'", "SES"))```Remember to learn more at the [modelbased website](https://easystats.github.io/modelbased/articles/estimate_slopes.html) !!:::### ContinuousWhat we have done for the Event can also be done for a continuos vairable!! When we pass a continuous variable to the `by` argument, the function automatically extracts a range of values across that predictor. We can control how many values to sample using the `length` parameter. For example, setting `length=3` will give us predictions at just three points along the range of our continuous variable, while a larger value would provide a more detailed sampling.```{r}estimate_expectation(mod, by ='StandardTrialN', length=3)estimate_expectation(mod, by ='StandardTrialN', length=6)```### Factor x ContinuousWant to see how factors and continuous variables interact? Simply include both in the `by` argument:```{r}#| warning: falseestimate_expectation(mod, by =c('StandardTrialN','Event'), length=6)```This creates predictions for all combinations of Event levels and StandardTrialN values - perfect for visualizing interaction effects!::: callout-tipIn this tutorial, we focused on the `estimate_expectation()` function. However, modelbased offers additional functions that return similar things but with slightly different defaults and behaviors.For example, if we run `estimate_relation(mod)`, instead of returning predictions based on the values in our original dataframe, it will automatically return predictions on a reference grid of all predictors. This is equivalent to running `estimate_expectation(mod, by = c('StandardTrialN','Event'))` that we saw before.These functions accomplish very similar tasks but may be more convenient depending on what you're trying to do. The different function names reflect their slightly different default behaviors, saving you time when you need different types of predictions. Don't be intimidated by these differences - explore them further at the [modelbased website](https://easystats.github.io/modelbased/articles/estimate_slopes.html) to learn them better.:::### PlotAll these predictions allow for simple direct plotting. These are ggplot objects that can be customized by adding other arguments (like we are doing here where we are changing the theme ). Note that there's a limit to how well these automatic plots handle complexity - with too many predictors, you might need to create custom plots instead.```{r}#| label: PlotPred#| warning: false#| fig-height: 5.5#| fig-width: 8Est_plot =estimate_expectation(mod, by =c('StandardTrialN','Event'), length=10)Est_plotplot(Est_plot )+theme_classic(base_size =20)+theme(legend.position ='bottom')```In this plot, we can see the predicted values for looking time for the two Events changing over the StandardTrialN. The lines represent our model's best estimates, while the shaded areas show the confidence intervals around these estimates.::: callout-importantIn this plot the predictions are extracted with the SES (socioeconomic status) being held at its high level! In a model with multiple predictors, predictions are always made with the other predictors fixed at specific values (typically their mean or reference level). Always be aware of what values other variables are being held at when interpreting your plots.:::# BUT....OK, I hope by this point you have a solid grasp of predictions and how to extract them at specific predictor levels! But wait - did you notice something interesting in the messages that `modelbased` has been giving us?The values we've extracted for the Event variable are always referenced to one specific level of SES. For example, when we look at the effect of Reward vs. NoReward events, we're seeing this effect specifically when SES is at its high level**But what if that's not what we want?** What if we want to know the effect of Event **overall**, not just at a specific level of SES? What if we want to understand the "main effect" of Event, averaging across all SES levels? Is it even possible to average a categorical predictor??This is where `estimate_means()` comes to the rescue! 💪# Estimate meansAs its name suggests, `estimate_means()` allows you to extract estimated means from your model. But what does this actually mean? (Yes, lots of "means" in this explanation!)This function:1. Generates predictions for all the reference levels of a model's predictors2. Averages these predictions to keep only the predictors we are interested inThink of it as a streamlined way to answer the question: "What is the average looking time for each level of my factor, according to my model?" Let's see it in action!!```{r}#| label: EstMeansFactor#| message: false#| warning: falseEst_means =estimate_means(mod, by='Event')print(Est_means)```Notice the information message! This approach is fundamentally different from what we did before. Previously, with `estimate_expectation()`, we were setting SES to its reference level ("high"). **Now instead, the function extracts predictions for all levels of SES and then averages across them!** These results represent the effect of Event type averaged across all SES conditions, giving you a more comprehensive view of how Event influences looking behavior.\THIS IS WHAT WE WANTED!!! 🎉🎉🎉::: callout-importantOne key difference between `estimate_expectation()` and `estimate_means()` is how they handle the random effect structure. By default, `estimate_means()` averages over all levels of the random effects, as noted in the function's message. This gives us an average effect of the predictor for an average subject - exactly what we need when trying to understand overall patterns in our data. To learn more about these nuances, check the comprehensive [vignette](https://easystats.github.io/modelbased/articles/mixed_models.html)in the modelbased package.:::## PlotThe `estimate_means()` function also comes with its own built-in plotting capability, automatically generating a clean ggplot visualization of your estimated means.```{r}#| label: PlotEstMean#| message: false#| warning: false#| fig-height: 5.5#| fig-width: 8plot(Est_means)+theme_classic(base_size =20)+theme(legend.position ='bottom')```This creates a simple yet informative ggplot showing our estimated means with confidence intervals. Of course, we can also use the dataframe that the function returns to create more customized and refined plots if needed.## ContinuousAs we have seen before for predictions, we can also pass a continuous predictor to our `by=` argument. In this case, `estimate_means()` gives us the averaged values across the range of the continuous predictor.```{r}#| label: EstMeansContinuos#| message: false#| warning: falseestimate_means(mod, by ='StandardTrialN', length=6)```# Estimate contrastsWe've successfully extracted the estimated means for our predictors - awesome job! 🎉**But there's something important missing**. While we now know the estimated looking times for each level, we don't yet have formal statistical comparisons between these levels. For example, is the difference between mean values in Reward condition and NoReward condition statistically significant?This is where `estimate_contrasts()` comes to the rescue! Let's see it in action:```{r}#| label: ContrastCategorical#| warning: falseContrast =estimate_contrasts(mod, contrast ='Event')print(Contrast)```What we're doing here is simple but powerful:1. We pass our model to the function2. We specify which predictor we want to compare (Event)3. The function returns the difference between the means of the predictors with statistical testsThe results show us that the difference between Reward and NoReward is positive (meaning looking time is higher for Reward) and the p-value indicates this difference is statistically significant! All of this in just one line of code - how convenient!::: callout-noteIn this example, Event has only 2 levels (Reward and NoReward), so we already knew these levels differed significantly from examining our model summary. However, `estimate_contrast()` gives us something more valuable: the **pairwise** difference between all the levels of the selected predictors with theirThis function becomes particularly powerful when dealing with predictors that have multiple levels. With a predictor like **SES** (3 levels) - `estimate_contrast()` would automatically calculate all possible pairwise comparisons between these levels.Remember that if you have multiple levels in predictors you will need to adjust the *p-value* for multiple comparisons. You can do that by passing the argument `p_adjust = 'hochberg'` to the function. With this the contrasts will be adjusted for multiple comparisons. There are different adjustment methods - [check the function arguments](https://easystats.github.io/modelbased/reference/estimate_contrasts.html).Let's run it on SES!!```{r}#| message: false#| warning: falseestimate_contrasts(mod, contrast ='SES',p_adjust ='hochberg')```See?? We have all the comparison between the available level of SES:::## Plot`estimate_contrast()` has a plotting function as well! However, it's slightly more complex as the plot function requires both our estimated **contrasts** and **means**. If we pass both, the plot will show which level is different from the other using lighthouse plots - visualizations that highlight significant differences between groups.```{r}#| label: PlotContrasts#| message: false#| warning: false#| fig-height: 5.5#| fig-width: 8plot(Contrast, Est_means)+theme_classic(base_size =20)+theme(legend.position ='bottom')```## Factor x ContinuousWhile `estimate_contrast()` is usually very useful for checking differences between levels of a categorical variable, it can also be used to estimate contrasts for continuous variables. However, where in our opinion the function really shines is when examining interactions between categorical and continuous predictors like we have in our model!!!```{r}#| label: SlopeFactorContinuos#| warning: false#| fig-height: 10#| fig-width: 10#| paged-print: falseContrastbyTrial =estimate_contrasts(mod, contrast ='Event', by ='StandardTrialN', p_adjust ='hochberg')ContrastbyTrial```::: callout-importantWhile we've been using `contrast` to specify one predictor and `by` for another, `estimate_contrasts()` is more flexible than we've shown. The `contrast =` argument can accept multiple predictors, calculating contrasts between all combinations of their levels. You can also mix and match with multiple predictors in `contrast =` while still using the `by =` argument to see how these combined contrasts change across levels of another variable. While this generates many comparisons at once (which isn't always desirable), it can be valuable for exploring complex interaction patterns in your data.**The word is your contrast!**:::This gives us the contrast between the levels of Event for a set of values of StandardTrialN. So we can check whether the difference between Reward and NoReward actually changes over the course of trials!! I'll plot it to make it simpler to understand:```{r}#| label: PlotContrastInteraction#| echo: false#| message: false#| warning: false#| fig-height: 6#| fig-width: 8.5ContrastbyTrial |>mutate(Contrast =paste(Level1, "-", Level2)) |>ggplot(aes(x = StandardTrialN , y = Difference, color = Contrast))+geom_hline(yintercept =0, linetype ='dashed')+geom_pointrange(aes(ymin = CI_low, ymax = CI_high))+labs(y ='Difference: Reward - NoReward')+theme_classic(base_size =20)+theme(legend.position ='bottom')```This plot shows that the two levels are always significantly different from each other (the confidence intervals never touch the dashed zero line) and that the difference is always positive - looking time for Reward is consistently higher than for NoReward across all trial numbers. SUUPER COOL EH!!! I agree!!## Contrast of slopesOK now we know the difference between levels of Event and also how this difference may change over time... the last step? We could check whether the slopes of StandardTrialN is different between the two conditions!To do so, we can again use the `estimate_contrast()` function but inverting the arguments we used last time. So StandardTrialN moves to the contrast argument and Event goes to the by argument. Super easy:```{r}#| label: ContrastFactorContinuos#| message: false#| warning: falseestimate_contrasts(mod, contrast ='StandardTrialN', by ='Event')```This shows us that the effect of StandardTrialN is actually different between the two levels. This is something we already knew from the model summary (remember the significant interaction term?), but this approach gives us the precise difference between the two slopes while averaging over all other possible effects. This becomes particularly valuable when working with more complex models that have multiple predictors and interactions.# Estimate slopeI know we've covered a lot of ground already, but there's one more important function worth introducing: `estimate_slope()`. While `estimate_means()` allows us to extract the means of predictions, `estimate_slope()` is designed specifically for understanding rates of change.As the name suggests, this function calculates the change in your outcome variable for each unit increase in your predictor. For continuous predictors like our `StandardTrialN`, this helps us understand how quickly looking time changes across trials.```{r}#| label: EstSlope#| warning: falseestimate_slopes(mod, trend ='StandardTrialN')```This tells us that the average effect of StandardTrialN is -50.46 (negative) and it is significantly different from 0. This means the looking time doesn't remain constant across trials but significantly decreases as trials progress.::: callout-importantYou may be thinking... "Wait, isn't this information already in the model summary? Why do I need `estimate_slopes()`?" If you are asking yourself this question, think back to the [Linear models](LinearModels.qmd) tutorial!!**What the model summary shows:** The coefficient for StandardTrialN in the summary (-67.118) represents the slope specifically for the NoReward condition. This is because in linear models, coefficients are in relation to the intercept, which is where all predictors are 0. For categorical variables, 0 means the reference level (NoReward in this case).**What estimate_slopes() shows:** The value from `estimate_slopes()` (-50.46) gives us the average slope across both Event types (Reward and NoReward).This is why the two values don't match! One is the slope for a specific condition, while the other is the average slope across all conditions. Super COOL right??:::We can also check if there are differences in slopes## Factor x ContinuousSimilar to the estimate_contrast function, estimate_slopes really shines when exploring interactions between continuous and categorical variables.Remember! Our model indicated an interaction between StandardTrialN and Event, which means the rate of change (slope) should differ between the two Event types. However, from just knowing there's an interaction, we can't determine exactly what these slopes are doing. One condition might show a steeper decline than the other, one might be flat while the other decreases, or they might even go in opposite directions. The model just tells us they're different, not how they're different (or at least it is not so simple)To visualize exactly how these slopes differ, we can include Event with the by argument:```{r}Slopes =estimate_slopes(mod, trend ='StandardTrialN', by ='Event')Slopes```Now we get the average effect for both Events. So we see that both of the lines significantly decrease!We can also plot it with:```{r}plot(Slopes)+geom_hline(yintercept =0, linetype ='dashed')+theme_classic(base_size =20)+theme(legend.position ='bottom')```Here we see that the slopes for both Event types are negative (below zero) and statistically significant (confidence intervals don't cross the dashed zero line). This confirms that looking time reliably decreases as trials progress, regardless of whether it's a Reward or NoReward event.# Complex plottingWe've covered a lot in this tutorial, and I know you might be getting tired! But let me add one final important skill: creating custom visualizations of your model predictions.As we've seen throughout this tutorial, most functions in the modelbased package include handy plotting capabilities that generate nice ggplots. These built-in plots are great for quick visualization and are easily customizable.**But...** there is always a but... sometimes you'll need more control, especially when you want to combine multiple effects in a single visualization, or your model has complex interactions.When built-in plotting functions aren't enough, we need to generate our own visualizations. But what should we plot? The raw data? Of course not! As you've learned throughout this tutorial... we should plot **PREDICTIONS**! While `estimate_relation()` is useful, I want to show you an even more flexible approach by breaking down how `estimate_relation()` works. This way, you'll be prepared for any visualization challenge, even with the most complex models.## DatagridThe first step in custom plotting is creating the right reference grid for your predictions. We'll use the `get_datagrid()` function, which takes your model as its first argument and the `by =` parameter to specify which predictors you want in your visualization.Let's extract a reference grid for our StandardTrialN \* Event interaction:```{r}get_datagrid(mod, by =c('Event', 'StandardTrialN'))```Looking at the output, you'll see we get both levels of Event (Reward and NoReward) and a range of values for StandardTrialN (10 values by default spanning the range of our data). Notice that the Id column is set to 0. This is how the function indicates that random effects aren't included—we're focusing on the fixed effects that represent the "average response" across all subjects.::: callout-cautionDifferent models declare non-level for the random effects in different ways, but datagrid will adapt to the specific model you are running. Just do not be scared if the values are not 0.:::The datagrid is where we can really flex our customization muscles. Instead of accepting the default values, we can request specific levels of a factor, custom ranges for continuous variables, or statistical landmarks of continuous variables.For example, instead of the default range of values, we could request `'StandardTrialN = [quartiles]'` which would give us the lower-hinge, median, and upper-hinge of the continuous variable. Cool, right? There are many functions and statistical aspects we can extract - check the documentation of `get_datagrid()` for the full range of possibilities.Let's create a more sophisticated datagrid to demonstrate:```{r}Grid =get_datagrid(mod, by =c('Event', 'StandardTrialN = [fivenum]'))head(as.data.frame(Grid), n=20 )```What we've done here is request both levels of Event, the five-number summary for StandardTrialN (minimum, lower-hinge, median, upper-hinge, maximum), and all different subjects by setting `include_random = TRUE`. Why would we want such a complex grid? Well, first things first to make it complex...... and also we want to see what is happening for each subject!! Let's now use this grid......## Prediction on gridNow that we have our grid, we can pass it to the `get_predicted()` function. This will give us all the predictions on the grid!!```{r}#| label: Predict#| message: false#| warning: falseget_predicted(mod, Grid, ci = T)```As you notice, this gives us a vector of predictions, but we also need the confidence intervals to make a proper plot. As the message mentions, those are 'hidden' and need to be accessed using `as.data.frame()`.```{r}#| label: PredictCI#| message: false#| warning: falsePred =as.data.frame(get_predicted(mod, Grid, ci = T))head(Pred)```Now we can merge the two to have a final dataframe that has all the information:```{r}#| label: Combine#| message: false#| warning: falsedb =bind_cols(Grid, Pred)head(db)```## PlotNow we have our dataframe we can do whatever we want with it. Let's plot it:```{r}#| label: PlotGrid#| message: false#| warning: false#| fig-height: 8#| fig-width: 10ggplot(db, aes(x = StandardTrialN, y = Predicted, color = Event, fill = Event ))+geom_ribbon(aes(ymin = Predicted-SE, ymax = Predicted+SE), alpha = .4)+geom_line(lwd =1.2 )+theme_classic(base_size =20)+theme(legend.position ='bottom')```Well, you did it! You've survived the wild jungle of model-based estimation! By breaking down the prediction process into customizable steps, you've gained complete control over how to extract and visualize the patterns in your eye-tracking data. Whether you use the convenient built-in functions or create custom plots from scratch, you now have the tools to answer sophisticated research questions and present your findings beautifully. Happy modeling!