Selective inference across age groups: psel_retro Example
psel_retro.RmdData used in this analysis was extracted from Table 1 of the
paper “Longitudinal
Sleep Patterns and Cognitive Impairment in Older Adults”. This
table reported means, standard deviations, and sample sizes for four
outcome variables (CES-D, Sleep, MMSES, MDRS) across three age groups,
along with significant overall p-values for all outcomes (i.e., the
overall null hypothesis of no difference between age groups was
rejected, for each of the four outcome variables). Therefore, for each
outcome variable, we use the psel_retro() function from the
lmFScreen package to compute selective p-values for
each pairwise comparison across age groups, conditional on rejection of
the overall null hypothesis (i.e., the overall ANOVA). The results of
this analysis can be found in the 2025 paper “Valid F-screening in
linear regression” by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113).
# Function to calculate SE, t-statistic, and p-value for pairwise comparisons
calculate_t_and_p <- function(mean1, mean2, n1, n2, mse, df_within) {
se <- sqrt(mse * (1/n1 + 1/n2))
t_stat <- (mean1 - mean2) / se
p_val <- 2 * pt(-abs(t_stat), df_within) # two-tailed p-value
sidak <- 1 - (1 - p_val)^3
return(list(t_stat = t_stat, p_val = p_val, sidak = sidak))
}CES-D Analysis
set.seed(826)
means <- c(9.33, 7.60, 8.79)
sds <- c(7.16, 5.84, 5.68)
ns <- c(150, 449, 227)
n <- sum(ns); p <- 3
overall_mean <- sum(means * ns) / sum(ns)
r_squared <- sum(ns * (means - overall_mean)^2) / (sum(ns * (means - overall_mean)^2) + sum((ns - 1) * sds^2))
mse <- sum((ns - 1) * sds^2) / (n - p)
rse <- sqrt(mse)
df_within <- n - p
results_1_2 <- calculate_t_and_p(means[1], means[2], ns[1], ns[2], mse, df_within)
results_1_3 <- calculate_t_and_p(means[1], means[3], ns[1], ns[3], mse, df_within)
results_2_3 <- calculate_t_and_p(means[2], means[3], ns[2], ns[3], mse, df_within)
cat("\nCES-D Naive p-values (with Sidak correction):\n")##
## CES-D Naive p-values (with Sidak correction):
cat("<65 vs 65–84: p =", results_1_2$p_val, ", sidak =", results_1_2$sidak, "\n")## <65 vs 65–84: p = 0.00254 , sidak = 0.007600662
cat("<65 vs ≥85: p =", results_1_3$p_val, ", sidak =", results_1_3$sidak, "\n")## <65 vs ≥85: p = 0.3972102 , sidak = 0.780973
cat("65–84 vs ≥85: p =", results_2_3$p_val, ", sidak =", results_2_3$sidak, "\n")## 65–84 vs ≥85: p = 0.01609221 , sidak = 0.04750392
cat("\nCES-D Selective p-values:\n")##
## CES-D Selective p-values:
cat("<65 vs 65–84:", psel_retro(n, p, r_squared, rse, (results_1_2$t_stat)), "\n")## <65 vs 65–84: 0.1165976
cat("<65 vs ≥85:", psel_retro(n, p, r_squared, rse, (results_1_3$t_stat)), "\n")## <65 vs ≥85: 0.397351
cat("65–84 vs ≥85:", psel_retro(n, p, r_squared, rse, (results_2_3$t_stat)), "\n")## 65–84 vs ≥85: 0.09905023
Sleep Analysis
set.seed(826)
means <- c(6.65, 6.93, 7.07)
sds <- c(0.87, 0.87, 1.02)
ns <- c(150, 449, 227)
n <- sum(ns); p <- 3
overall_mean <- sum(means * ns) / sum(ns)
r_squared <- sum(ns * (means - overall_mean)^2) / (sum(ns * (means - overall_mean)^2) + sum((ns - 1) * sds^2))
mse <- sum((ns - 1) * sds^2) / (n - p)
rse <- sqrt(mse)
df_within <- n - p
results_1_2 <- calculate_t_and_p(means[1], means[2], ns[1], ns[2], mse, df_within)
results_1_3 <- calculate_t_and_p(means[1], means[3], ns[1], ns[3], mse, df_within)
results_2_3 <- calculate_t_and_p(means[2], means[3], ns[2], ns[3], mse, df_within)
cat("\nSleep Naive p-values (with Sidak correction):\n")##
## Sleep Naive p-values (with Sidak correction):
cat("<65 vs 65–84: p =", results_1_2$p_val, ", sidak =", results_1_2$sidak, "\n")## <65 vs 65–84: p = 0.001202429 , sidak = 0.00360295
cat("<65 vs ≥85: p =", results_1_3$p_val, ", sidak =", results_1_3$sidak, "\n")## <65 vs ≥85: p = 1.409129e-05 , sidak = 4.227326e-05
cat("65–84 vs ≥85: p =", results_2_3$p_val, ", sidak =", results_2_3$sidak, "\n")## 65–84 vs ≥85: p = 0.06025183 , sidak = 0.1700834
cat("\nSleep Selective p-values:\n")##
## Sleep Selective p-values:
cat("<65 vs 65–84:", psel_retro(n, p, r_squared, rse, (results_1_2$t_stat)), "\n")## <65 vs 65–84: 0.001254953
cat("<65 vs ≥85:", psel_retro(n, p, r_squared, rse, (results_1_3$t_stat)), "\n")## <65 vs ≥85: 0.00284495
cat("65–84 vs ≥85:", psel_retro(n, p, r_squared, rse, (results_2_3$t_stat)), "\n")## 65–84 vs ≥85: 0.06058
MMSES Analysis
set.seed(826)
means <- c(29.31, 28.12, 26.14)
sds <- c(1.19, 2.42, 4.15)
ns <- c(150, 449, 227)
n <- sum(ns); p <- 3
overall_mean <- sum(means * ns) / sum(ns)
r_squared <- sum(ns * (means - overall_mean)^2) / (sum(ns * (means - overall_mean)^2) + sum((ns - 1) * sds^2))
mse <- sum((ns - 1) * sds^2) / (n - p)
rse <- sqrt(mse)
df_within <- n - p
results_1_2 <- calculate_t_and_p(means[1], means[2], ns[1], ns[2], mse, df_within)
results_1_3 <- calculate_t_and_p(means[1], means[3], ns[1], ns[3], mse, df_within)
results_2_3 <- calculate_t_and_p(means[2], means[3], ns[2], ns[3], mse, df_within)
cat("\nMMSES Naive p-values (with Sidak correction):\n")##
## MMSES Naive p-values (with Sidak correction):
cat("<65 vs 65–84: p =", results_1_2$p_val, ", sidak =", results_1_2$sidak, "\n")## <65 vs 65–84: p = 1.151974e-05 , sidak = 3.455883e-05
cat("<65 vs ≥85: p =", results_1_3$p_val, ", sidak =", results_1_3$sidak, "\n")## <65 vs ≥85: p = 1.925529e-24 , sidak = 0
cat("65–84 vs ≥85: p =", results_2_3$p_val, ", sidak =", results_2_3$sidak, "\n")## 65–84 vs ≥85: p = 8.577183e-17 , sidak = 3.330669e-16
cat("\nMMSES Selective p-values:\n")##
## MMSES Selective p-values:
cat("<65 vs 65–84:", psel_retro(n, p, r_squared, rse, results_1_2$t_stat), "\n")## <65 vs 65–84: 9e-06
cat("<65 vs ≥85:", psel_retro(n, p, r_squared, rse, results_1_3$t_stat), "\n")## <65 vs ≥85: 0
cat("65–84 vs ≥85:", psel_retro(n, p, r_squared, rse, results_2_3$t_stat), "\n")## 65–84 vs ≥85: 0
MDRS Analysis
set.seed(826)
means <- c(141.5, 139.0, 131.6)
sds <- c(2.57, 5.85, 14.89)
ns <- c(150, 449, 227)
n <- sum(ns); p <- 3
overall_mean <- sum(means * ns) / sum(ns)
r_squared <- sum(ns * (means - overall_mean)^2) / (sum(ns * (means - overall_mean)^2) + sum((ns - 1) * sds^2))
mse <- sum((ns - 1) * sds^2) / (n - p)
rse <- sqrt(mse)
df_within <- n - p
results_1_2 <- calculate_t_and_p(means[1], means[2], ns[1], ns[2], mse, df_within)
results_1_3 <- calculate_t_and_p(means[1], means[3], ns[1], ns[3], mse, df_within)
results_2_3 <- calculate_t_and_p(means[2], means[3], ns[2], ns[3], mse, df_within)
cat("\nMDRS Naive p-values (with Sidak correction):\n")##
## MDRS Naive p-values (with Sidak correction):
cat("<65 vs 65–84: p =", results_1_2$p_val, ", sidak =", results_1_2$sidak, "\n")## <65 vs 65–84: p = 0.00325966 , sidak = 0.009747137
cat("<65 vs ≥85: p =", results_1_3$p_val, ", sidak =", results_1_3$sidak, "\n")## <65 vs ≥85: p = 3.523733e-24 , sidak = 0
cat("65–84 vs ≥85: p =", results_2_3$p_val, ", sidak =", results_2_3$sidak, "\n")## 65–84 vs ≥85: p = 9.552161e-23 , sidak = 0
cat("\nMDRS Selective p-values:\n")##
## MDRS Selective p-values:
cat("<65 vs 65–84:", psel_retro(n, p, r_squared, rse, results_1_2$t_stat), "\n")## <65 vs 65–84: 0.003283
cat("<65 vs ≥85:", psel_retro(n, p, r_squared, rse, results_1_3$t_stat), "\n")## <65 vs ≥85: 0
cat("65–84 vs ≥85:", psel_retro(n, p, r_squared, rse, results_2_3$t_stat), "\n")## 65–84 vs ≥85: 0
Summary
In this analysis, we revisited pairwise comparisons across age groups for four outcomes reported in the JAMA Network Open paper “Longitudinal Sleep Patterns and Cognitive Impairment in Older Adults”, by accounting for the fact that these pairwise comparisons would not have been conducted if the overall null hypothesis of no difference across age groups had not been conducted. While some selective p-values differ meaningfully from their standard/naive counterparts (in the case of the CES-D outcome), others remain similar. The Sidak-corrected p-values reported in the original paper align with the Sidak-corrected naive p-values shown here.
By conditioning on rejection of the overall null hypothesis, the selective p-values provide a more principled post-hoc analysis by guarding against inflated Type I error.