Draft

Pupil data pre-processing

Pupillometry
R
Pre-processing
Keywords

pupil dilation, pupillometry, eye-tracking, experiments, infant research, DevStart, developmental science

Welcome to your first step into the world of pupillometry! In this tutorial, we’ll walk you through how to preprocess pupil data using a handy R package called PupillometryR. This package makes it simple to clean and even analyze your pupil data with just a few lines of R code.

To keep things straightforward, we’ll be working with a simulated dataset that you can download right here (insert link). This dataset is based on the experimental design we introduced earlier in our eye-tracking series. If you’re not familiar with it or need a quick refresher, we recommend checking out the “Introduction to eye-tracking” guide before diving in.

This tutorial serves as a foundation for understanding how to preprocess pupil data. Once you’ve grasped the essentials, we encourage you to explore the full range of functions and features PupillometryR has to offer.

Read the data

Let’s begin by importing the necessary libraries and loading the downloaded dataframe.

Perfect now let’s read our dataframe

Raw_data = read.csv("..\\..\\resources\\Pupillometry\\PupilData.csv")
head(Raw_data) # database peak
  X   Subject      Time   PupilL   PupilR   Events TrialN
1 1 Subject_1  1.000000 3.704922 3.298884 Fixation      1
2 2 Subject_1  4.333526 3.697448 3.292230     <NA>     NA
3 3 Subject_1  7.667052 3.679314 3.276083     <NA>     NA
4 4 Subject_1 11.000578 3.702009 3.296291     <NA>     NA
5 5 Subject_1 14.334104 3.686186 3.282202     <NA>     NA
6 6 Subject_1 17.667630 3.712768 3.305871     <NA>     NA

Our dataframe consists of several easily interpretable columns. Time represents elapsed time in milliseconds, Subject identifies the participant, and Events indicates when and which stimuli were presented. TrialN tracks the trial number, while PupilL and PupilR measure pupil dilation for the left and right eyes, respectively, in millimeters.

Let’s plot the data! Visualizing it first is always a crucial step as it provides an initial understanding of its structure and key patterns.

ggplot(Raw_data, aes(x = Time, y = PupilR))+
  geom_line(aes(y = PupilR, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = PupilL, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(data = Raw_data |> dplyr::filter(!is.na(Events)) , aes(xintercept = Time, color = Events), linetype = 'dashed') +

  facet_wrap(~Subject)+
  theme_bw(base_size = 35)+
  ylim(1,6)+
  scale_color_manual(values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A'))+
  theme(legend.position = 'bottom') +
  guides(color = guide_legend(override.aes = list(lwd = 20)))  # Make legend lines thicker

Prepare the data

Now that we’ve visualized the data, you’ve probably noticed two things:

  1. So many events! That’s intentional — it’s better to have too many triggers than miss something important. But don’t worry, for our pupil dilation analysis, we only care about two key events: Circle and Square (check the paradigm intro if you need a refresher on why that is).

  2. Single-sample events! Like in most studies, events are marked at a single time point (when the stimulus is presented). But PupilometryR needs a different structure — it expects the event value to be repeated for every row while the event is happening.

So, how do we fix this? First, let’s isolate the rows in our dataframe where the events are Circle or Square. We start by creating a list of the events we care about, then use it to filter our dataframe and keep only the rows related to those events in a new dataframe called Events

Events_to_keep = c('Circle','Square')
Events = filter(Raw_data, Events %in% Events_to_keep) # filter data
head(Events) # database peak
      X   Subject       Time   PupilL   PupilR Events TrialN
1   221 Subject_1   734.3757       NA       NA Circle      1
2  2552 Subject_1  8504.8248 3.288117 2.927758 Square      2
3  4883 Subject_1 16275.2739 2.926076 2.605395 Circle      3
4  7215 Subject_1 24049.0565 3.463782 3.084172 Circle      4
5  9546 Subject_1 31819.5055 3.433078 3.056833 Square      5
6 11877 Subject_1 39589.9546 3.890758 3.464353 Circle      6

Perfect! Now onto the second point: we need to repeat the events we just selected for the entire duration we want to analyze. But what’s this duration? We want to cover the full cue presentation (2 seconds), plus an extra 0.1 seconds before the stimulus appears. Why? This pre-stimulus period will serve as our baseline, which we’ll use later in the analysis.

So, let’s define how much time to include before and after the stimulus. We’ll also set the framerate of our data (300Hz) and create a time vector that starts from the pre-stimulus period and continues in steps of 1/Hz, with a total length equal to Pre_stim + Post_stim. In addition we also create a copy of our raw data called Trial_data. We do this so the original data stays intact, and all our edits happen on the copy

# Settings to cut events
Pre_stim = 100 # pre stimulus time
Post_stim = 2000 # post stimulus time
Fs = 300 # framerate

Time = seq(from = -Pre_stim, by=1000/Fs, length.out = (Pre_stim+Post_stim)/1000*300-1) # time vector

# create copy of the raw data
Trial_data = Raw_data 

Here’s where the magic happens. We loop through each event listed in our Events dataframe. Each row in Events corresponds to a specific event (like a “Circle” or “Square” cue) that occurred for a specific subject during a specific trial.

For each event, we extract three key details:

  • Subject (to know which participant the event is for)

  • Event (to know if it’s a Circle or Square cue)

  • TrialN (to know which trial this event is part of)

Next, we identify the rows of interest in our dataframe. We look for rows where the Subject matches the current participant, and the Time falls within the window from Pre_stim to Post_stim relative to the event’s start.

Finally, we use these identified rows to add the event information. The Time, Event, and TrialN values are repeated across all the rows in this window, ensuring every row in the event window is properly labeled.

# Loop for each event of each subject
for (trial in 1:nrow(Events)){

    # Extract the information
    Sub = Events[trial, 'Subject']
    Event = Events[trial, 'Events']
    TrialN = Events[trial, 'TrialN']
    Onset = Events[trial, 'Time']

    
    # Find the rows to update based on subject and time
    rows_to_update = which(Trial_data$Subject == Sub &
                           Trial_data$Time >= (Onset - Pre_stim) &
                           Trial_data$Time < (Onset + Post_stim))
   
    # Repeat the values of interest for all the rows
    Trial_data[rows_to_update, 'Time'] = Time
    Trial_data[rows_to_update, 'Events'] = Event
    Trial_data[rows_to_update, 'TrialN'] = TrialN
}

Perfect! We’ve successfully extended the event information backward and forward based on our Pre_stim and Post_stim windows. Now, it’s time to clean things up.

Since we only care about the rows that are part of our trial of interest, we’ll remove all the rows that don’t belong to these event windows. This will leave us with a clean, focused dataset that only contains the data relevant to our analysis.

Trial_data = Trial_data %>% 
    filter(Events %in% Events_to_keep)

Very good job reaching to this point!! if you believe it or not this was the hardest part!! We have now our different dataset!! let’s take a look:

Code
ggplot(Trial_data, aes(x = Time, y = PupilR, group = TrialN)) +
  geom_line(aes(y = PupilR, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = PupilL, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(aes(xintercept = 0), linetype = 'dashed', color = 'black', lwd = 1.2) +
  facet_wrap(~Subject) +
  
  ylim(1, 6) +
  scale_color_manual(values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A')) +
  
  theme_bw(base_size = 35) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

Make pupillometryR data

Ok, now it’s time to start working with PupillometryR! 🎉

In the previous steps, we changed our event structure, and you might be wondering — why all that effort? Well, it’s because PupillometryR needs the data in this specific format to do its magic. To get started, we’ll pass our dataframe to the make_pupillometryr_data() function. If you’re already thinking, “Oh no, not another weird object type that’s hard to work with!” — don’t worry! The good news is that the main object it creates is just a regular dataframe. That means we can still interact with like we’re used to. This makes the pre-processing steps much less frustrating. Let’s get started!

# Trial_data[Trial_data$Subject == 'Subject_1' & Trial_data$TrialN==1,]$PupilL = NA
# Trial_data[Trial_data$Subject == 'Subject_1' & Trial_data$TrialN==1,]$PupilR = NA
# Trial_data[Trial_data$Subject == 'Subject_4' & Trial_data$TrialN==1,]$PupilL = NA
# Trial_data[Trial_data$Subject == 'Subject_4' & Trial_data$TrialN==1,]$PupilR = NA

PupilR_data = make_pupillometryr_data(data = Trial_data,
                                 subject = Subject,
                                 trial = TrialN,
                                 time = Time,
                                 condition = Events)

Here, we’re simply using the make_pupillometryr_data() function to pass in our data and specify which columns contain the key information. This tells PupillometryR where to find the crucial details, like subject IDs, events, and pupil measurements, so it knows how to structure and process the data properly.

Tip

If you have extra columns that you want to keep in your PupillometryR data during preprocessing, you can pass them as a list using the other = c(OtherColumn1, OtherColumn2) argument. This allows you to keep these additional columns alongside your main data throughout most of the preprocessing steps.

But here’s a heads-up — not all functions can keep these extra columns every time. For example, downsampling may not retain them since it reduces the number of rows, and it’s not always clear how to summarize extra columns. So, keep that in mind as you plan your analysis!

Plot

One cool feature of the data created using make_pupillometryr_data() is that it comes with a simple, built-in plot function. This makes it super easy to visualize your data without needing to write tons of code. The plot function works by averaging the data over the group variable. So we can group over subject or condition. Here we use the group variable to focus on the condition and average over the subjects.

In this example, we’re plotting the PupilL (left pupil) data, grouped by condition. The plot() function is actually just a ggplot2 wrapper, which means you can customize to a certain extent like any other ggplot. That’s why we can add elements to it, like theme_bw(), which gives the plot a cleaner, black-and-white look. Give it a go without adding anything and then learn to customize it!!

Tip

Pro tip! If you want more control over your plots, you can always use ggplot2. Remember, the Pupil data is just a regular dataframe, so you can plot it in any way you like!

plot(PupilR_data, pupil = PupilL, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

Note

In this tutorial, we’ll use two methods to plot our data. We’ll use the PupillometryR plot to visualize the average pupil response by condition, and we’ll also use ggplot to manually plot our data. Both approaches are valid and offer unique benefits.

The PupillometryR plot provides a quick overview by automatically averaging pupil responses across condition levels, making it ideal for high-level trend visualization. On the other hand, ggplot gives you full control to visualize specific details or customize every aspect of the plot, allowing for deeper insights and flexibility.

Pre-processing

Now that we have our pupillometry data in the required format we can actually start the pre-processing!!

Regress

The first step is to regress PupilL against PupilR (and vice versa) using a simple linear regression. This corrects small inconsistencies in pupil data caused by noise. The regression is done separately for each participant, trial, and time point, ensuring smoother and more consistent pupil dilation measurements.

Regressed_data = regress_data(data = PupilR_data,
                                pupil1 = PupilL,
                                pupil2 = PupilR)
Error in `mutate()`:
ℹ In argument: `pupil1newkk = .predict_right(PupilL, PupilR)`.
ℹ In group 1: `Subject = "Subject_1"`, `TrialN = 1`, `Events = "Circle"`.
Caused by error in `lm.fit()`:
! 0 (non-NA) cases

Pwa pwa pwaaaaa…!!🤦‍♂️ We got an error!

What’s it saying? It’s telling us that one of the trials is completely full of NAs, and since there’s no data to work with, the function fails. This happens a lot when testing infants — they don’t always do what we expect, like watching the screen. Instead, they move around or look away.

We’ll deal with missing data properly later, but for now, we need a quick fix. What can we do? We can simply drop any trials where both pupils (PupilL and PupilR) are entirely NA. This way, we avoid errors and keep the analysis moving.

So let’s filter our data and then redo the last two steps (make PupilR_data and then regress data)

# Filter the trial data
Trial_data = Trial_data %>%
    group_by(Subject, TrialN) %>%  # group by Subject and TrialN
    filter(!all(is.na(PupilL) & is.na( PupilR))) %>% # filter out if both PupilR and PupilL are all NA
    ungroup()  # Remove grouping

# Make pupilloemtryR data
PupilR_data = make_pupillometryr_data(data = Trial_data,
                                 subject = Subject,
                                 trial = TrialN,
                                 time = Time,
                                 condition = Events)
# Regress data
Regressed_data = regress_data(data = PupilR_data,
                               pupil1 = PupilL,
                                pupil2 = PupilR)

And now everything worked!! Perfect!

Mean pupil

As the next steps we will average the two pupil signals. This will create a new variable called mean_pupil

Mean_data = calculate_mean_pupil_size(data = Regressed_data, 
                                       pupil1 = PupilL, 
                                       pupil2 = PupilR)

plot(Mean_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

Lowpass

Now that we have a single pupil signal, we can move on to filtering it. The goal here is to remove fast noise and fluctuations that aren’t relevant to our analysis. Why? Because we know that pupil dilation is a slow physiological signal, and those rapid changes are likely just noise from blinks, eye movements, or measurement errors. By filtering out these fast fluctuations, we can focus on the meaningful changes in pupil dilation that matter for our analysis.

filtered_data = filter_data(data = Mean_data,
                             pupil = mean_pupil,
                             filter = 'median',
                             degree = 11)
plot(filtered_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

There are different ways to filter the data in PupillometryR we suggest you check the actual package website and make decision based on your data (filter_data). Here we use a median filter based on a 11 sample window.

Downsample

As mentioned above, Pupil dilation is a slow signal, so 20Hz is enough — no need for 300Hz. Downsampling reduces file size, speeds up processing, and naturally smooths the signal by filtering out high-frequency noise, all while preserving the key information we need for analysis. To downsample to 20Hz, we’ll set the timebin size to 50 ms (since 1/20 = 0.05 seconds = 50 ms) and calculate the median for each time bin.

NewHz = 20
timebinSize = 1/NewHz

Downsampled_data = downsample_time_data(data = filtered_data,
                              pupil = mean_pupil,
                              timebin_size = timebinSize,
                              option = 'median')
plot(Downsampled_data, pupil = mean_pupil, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

Trial Rejection

Now that our data is smaller and smoother, it’s a good time to take a look at it. It doesn’t make sense to keep trials that are mostly missing values, nor does it make sense to keep participants with very few good trials.

While you might already have info on trial counts and participant performance from other sources (like video coding), PupillometryR has a super handy function to check this directly. This way, you can quickly see how many valid trials each participant has and decide which ones to keep or drop.

Missing_data = calculate_missing_data(Downsampled_data,
                                       mean_pupil)
head(Missing_data, n=20)
# A tibble: 20 × 3
   Subject   TrialN Missing
   <chr>      <int>   <dbl>
 1 Subject_1      2  0.111 
 2 Subject_1      3  0.0159
 3 Subject_1      4  0.130 
 4 Subject_1      5  0.0541
 5 Subject_1      6  0     
 6 Subject_1      7  0.141 
 7 Subject_1      8  0.132 
 8 Subject_1      9  0     
 9 Subject_1     10  0     
10 Subject_2      1  0     
11 Subject_2      2  0     
12 Subject_2      3  0.107 
13 Subject_2      4  0.124 
14 Subject_2      5  0     
15 Subject_2      6  0.0954
16 Subject_2      7  0     
17 Subject_2      8  0.141 
18 Subject_2      9  0.0922
19 Subject_2     10  0.119 
20 Subject_3      1  0.107 

This gives us a new dataframe that shows the amount of missing data for each subject and each trial. While we could manually decide which trials and subjects to keep or remove, PupillometryR makes it easier with the clean_missing_data() function.

This function lets you set two % thresholds — one for trials and one for subjects. Here, we’ll set it to reject trials with more than 25% missing data (keep at least 75% of the data) and reject subjects with more than 25% missing data. This way, we ensure our analysis is based on clean, high-quality data.

Clean_data = clean_missing_data(Downsampled_data,
                                 pupil = mean_pupil,
                                 trial_threshold = .75,
                                 subject_trial_threshold = .75)
Warning

Note that this function calculates the percentage of missing trials based only on the trials present in the dataframe. For example, if a participant only completed one trial (and watched it perfectly) before the session had to stop, the percentage would be calculated on that single trial, and the participant wouldn’t be rejected.

If you have more complex conditions for excluding participants (e.g., based on total expected trials or additional criteria), you’ll need to handle this manually to ensure subjects are dropped appropriately.

Fill the signal

Now our data is clean, but… while the average signal for each condition looks smooth (as seen in our plots), the data for each individual participant is still noisy. We can still spot blinks and missing data in the signal.

To handle this, we’ll use interpolation to fill in the missing points. Interpolation “connects the dots” between gaps, creating a more continuous and cleaner signal. This step is crucial because large chunks of missing data can distort our analysis, and interpolation allows us to retain more usable data from each participant.

ggplot(Clean_data, aes(x = Time, y = mean_pupil, group = TrialN, color= Events))+
  geom_line( lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

So to remove these missing values we can interpolate our data. Interpolating is easy with pupillometryR we can simply:

Int_data = interpolate_data(data = Clean_data,
                             pupil = mean_pupil,
                             type = 'linear')

ggplot(Int_data, aes(x = Time, y = mean_pupil, group = TrialN, color = Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

Done!! Well, you’ve probably noticed something strange… When there’s a blink, the pupil signal can rapidly decrease until it’s completely missing. Right now, this drop gets interpolated, and the result is a weird, unrealistic curve where the signal dips sharply and then smoothly recovers. This makes our data look horrible! 😩

Let’s fix it!

To do this, we’ll use PupillometryR’s blink detection functions. There are two main ways to detect blinks:

  1. Based on size — detects pupil size.

  2. Based on velocity — detects rapid changes in pupil size (which happens during blinks).

Here, we’ll use detection by velocity. We set a velocity threshold to detect when the pupil size changes too quickly. To ensure we capture the full blink, we use extend_forward and extend_back to expand the blink window, including the fast decrease in pupil size. The key idea is to make the entire blink period NA, not just the moment the pupil disappears. This prevents interpolation from creating unrealistic artifacts. When we interpolate, the process skips over the entire blink period, resulting in a cleaner, more natural signal.

Blink_data = detect_blinks_by_velocity(
    Clean_data,
    mean_pupil,
    threshold = 0.1,
    extend_forward = 50,
    extend_back = 50)

ggplot(Blink_data, aes(x = Time, y = mean_pupil, group = TrialN, color=Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

See !! now the rapid shrinking disappeared and we can now interpolate

Int_data = interpolate_data(data = Blink_data,
                             pupil = mean_pupil,
                             type = 'linear')

ggplot(Int_data, aes(x = Time, y = mean_pupil, group = TrialN, color=Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

Look how beautiful our signal is now!! 😍 Good job!!!

Caution

It’s not always the case that blinks will have a rapid drop in pupil size followed by missing values. Sometimes, you’ll just get missing values straight away. This can depend on the eye tracker, sampling rate, and even the population you’re testing (like infants who might look away suddenly). Because of this, blink detection may not always be necessary. The best approach is to check your data before deciding. And how do you check it? Plotting! Plotting your signal is the best way to see if blinks are causing rapid drops or if you’re just dealing with missing data. Let the data guide your decisions.

Baseline Correction

Good job getting this far!! We’re now at the final step of our pre-processing: baseline correction.

Baseline correction helps remove variability between trials and participants, like differences in baseline pupil size caused by individual differences, fatigue, or random fluctuations. By doing this, we can focus only on the variability caused by our paradigm. This step ensures that any changes we see in pupil size are truly driven by the experimental conditions, not irrelevant noise. To further adjust the data, we’ll subtract the calculated baseline, ensuring the values start at 0 instead of -100.

Let’s get it done!

Base_data = baseline_data(data = Int_data,
                           pupil = mean_pupil,
                           start = -100,
                           stop = 0)

# Remove the baseline
Final_data = subset_data(Base_data, start = 0)

Let’s plot it to see what baseline correction and removal are actually doing!! We will plot both the average signal using the plot function (with some addition information about color and theme) and using ggplot to plot the data for each subject separately.

One = plot(Final_data, pupil = mean_pupil, group = 'condition')+
  theme_bw(base_size = 45) +
  theme(legend.position = 'none')


Two = ggplot(Final_data, aes(x = Time, y = mean_pupil, group = TrialN, color = Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+

  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

# Using patchwork to put the plot together
One / Two

Save and analysis

This tutorial will not cover the analysis of pupil dilation. We’ll stop here since, after baseline correction, the data is ready to be explored and analyzed. From this point on, we’ll shift from pre-processing to analysis, so it’s a good idea to save the data as a simple .csv file for easy access and future use.

write.csv(Final_data, "..\\..\\resources\\Pupillometry\\ProcessedPupilData.csv")

There are multiple ways to analyze pupil data, and we’ll show you some of our favorite methods in a dedicated tutorial: Analyze Pupil Dilation.

All code

Here below we report the whole code we went trough this tutorial as an unique scrit to make it easier for you to copy and explore it in it’s entirety. We

# Setting and Libraries -----------------------------------------------------------------

library(PupillometryR)
library(tidyverse)
library(patchwork)


# Read Data -----------------------------------------------------------------

Raw_data = read.csv("..\\..\\resources\\Pupillometry\\PupilData.csv")

head(Raw_data)

## Plot Raw Data -----------------------------------------------------------------

ggplot(Raw_data, aes(x = Time, y = PupilR)) +
  geom_line(aes(y = PupilR, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = PupilL, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(data = Raw_data |> dplyr::filter(!is.na(Events)), aes(xintercept = Time, color = Events), linetype = 'dashed') +
  facet_wrap(~Subject) +
  theme_bw(base_size = 35) +
  ylim(1, 6) +
  scale_color_manual(values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A')) +
  theme(legend.position = 'bottom') +
  guides(color = guide_legend(override.aes = list(lwd = 20)))


# Prepare Data -----------------------------------------------------------------

## Extract Events -----------------------------------------------------------------

Events_to_keep = c('Circle','Square')

Events = filter(Raw_data, Events %in% Events_to_keep)

head(Events)


## Event Settings -----------------------------------------------------------------

Pre_stim = 100

Post_stim = 2000

Fs = 300

Time = seq(from = -Pre_stim, by = 1000 / Fs, length.out = (Pre_stim + Post_stim) / 1000 * 300 - 1)

Trial_data = Raw_data


## Extend Events -----------------------------------------------------------------

for (trial in 1:nrow(Events)) {
  Sub = Events[trial, 'Subject']
  Event = Events[trial, 'Events']
  TrialN = Events[trial, 'TrialN']
  Onset = Events[trial, 'Time']
  rows_to_update = which(Trial_data$Subject == Sub &
                           Trial_data$Time >= (Onset - Pre_stim) &
                           Trial_data$Time < (Onset + Post_stim))
  Trial_data[rows_to_update, 'Time'] = Time
  Trial_data[rows_to_update, 'Events'] = Event
  Trial_data[rows_to_update, 'TrialN'] = TrialN
}


## Filter Events -----------------------------------------------------------------

Trial_data = Trial_data %>% 
  filter(Events %in% Events_to_keep)


### Plot Events -----------------------------------------------------------------

ggplot(Trial_data, aes(x = Time, y = PupilR, group = TrialN)) +
  geom_line(aes(y = PupilR, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = PupilL, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(aes(xintercept = 0), linetype = 'dashed', color = 'black', lwd = 1.2) +
  facet_wrap(~Subject) +
  ylim(1, 6) +
  scale_color_manual(values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A')) +
  theme_bw(base_size = 35) +
  theme(legend.position = 'bottom', legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))


# Pre-processing -----------------------------------------------------------------

## Make PupillometryR Data -----------------------------------------------------------------

PupilR_data = make_pupillometryr_data(data = Trial_data,
                                      subject = Subject,
                                      trial = TrialN,
                                      time = Time,
                                      condition = Events)

### Plot ------------------------------------------------------------------
plot(PupilR_data, pupil = PupilL, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 



## Regress Data -----------------------------------------------------------------

Regressed_data = regress_data(data = PupilR_data,
                               pupil1 = PupilL,
                               pupil2 = PupilR)


## Filter Out Trials with NA -----------------------------------------------------------------

Trial_data = Trial_data %>%
  group_by(Subject, TrialN) %>%
  filter(!all(is.na(PupilL) & is.na(PupilR))) %>%
  ungroup()

PupilR_data = make_pupillometryr_data(data = Trial_data,
                                      subject = Subject,
                                      trial = TrialN,
                                      time = Time,
                                      condition = Events)

Regressed_data = regress_data(data = PupilR_data,
                               pupil1 = PupilL,
                               pupil2 = PupilR)


## Calculate Mean Pupil -----------------------------------------------------------------

Mean_data = calculate_mean_pupil_size(data = Regressed_data, 
                                       pupil1 = PupilL, 
                                       pupil2 = PupilR)

### Plot --------------------------------------------------------------------

plot(Mean_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Lowpass Filter -----------------------------------------------------------------

filtered_data = filter_data(data = Mean_data,
                             pupil = mean_pupil,
                             filter = 'median',
                             degree = 11)

### Plot --------------------------------------------------------------------

plot(filtered_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Downsample -----------------------------------------------------------------

NewHz = 20

timebinSize = 1 / NewHz

Downsampled_data = downsample_time_data(data = filtered_data,
                                         pupil = mean_pupil,
                                         timebin_size = timebinSize,
                                         option = 'median')

# Plot --------------------------------------------------------------------

plot(Downsampled_data, pupil = mean_pupil, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Calculate Missing Data -----------------------------------------------------------------

Missing_data = calculate_missing_data(Downsampled_data, mean_pupil)


## Clean Missing Data -----------------------------------------------------------------

Clean_data = clean_missing_data(Downsampled_data,
                                 pupil = mean_pupil,
                                 trial_threshold = .75,
                                 subject_trial_threshold = .75)

### Plot --------------------------------------------------------------------

ggplot(Clean_data, aes(x = Time, y = mean_pupil, group = TrialN, color= Events))+
  geom_line( lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+
  
  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Detect Blinks -----------------------------------------------------------------

Blink_data = detect_blinks_by_velocity(
  Clean_data,
  mean_pupil,
  threshold = 0.1,
  extend_forward = 50,
  extend_back = 50)

### Plot --------------------------------------------------------------------

ggplot(Blink_data, aes(x = Time, y = mean_pupil, group = TrialN, color=Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+
  
  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Interpolate Data -----------------------------------------------------------------

Int_data = interpolate_data(data = Clean_data,
                             pupil = mean_pupil,
                             type = 'linear')

### Plot --------------------------------------------------------------------

ggplot(Int_data, aes(x = Time, y = mean_pupil, group = TrialN, color = Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))


# Baseline correction -----------------------------------------------------

Base_data = baseline_data(data = Int_data,
                           pupil = mean_pupil,
                           start = -100,
                           stop = 0)

# Remove the baseline
Final_data = subset_data(Base_data, start = 0)


### Final plot --------------------------------------------------------------

One = plot(Final_data, pupil = mean_pupil, group = 'condition')+
  theme_bw(base_size = 45) +
  theme(legend.position = 'none')


Two = ggplot(Final_data, aes(x = Time, y = mean_pupil, group = TrialN, color = Events))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+
  
  facet_wrap(~Subject)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

One / Two



# Save data ---------------------------------------------------------------

write.csv(Final_data, "..\\..\\resources\\Pupillometry\\ProcessedPupilData.csv")
Back to top