Retrospective Selective P-Value Based on Summary Statistics
psel_retro.RdThis function is useful for conducting valid retrospective F-screening as defined in the 2025 paper "Valid F-screening in linear regression" by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113). Suppose that we have access to the outputs of an "overall" least squares linear regression model, such as from the output of summary(lm(y~X)), and we want to conduct a test of the significance of a single regression coefficient (beta_j) that accounts for the rejection of the "overall" F-test. Then this function can provide a selective p-value for beta_j based on of only a few summary statistics. The arguments of this function include R-squared and residual standard error (RSE) from the overall model (e.g. from summary(lm(y~X))), and a t-statistic for the test of H_0: beta_j=0. This function is especially useful in settings where the raw data is unavailable, such as published studies.
Usage
psel_retro(
n,
p,
R_squared,
RSE,
tstat,
sigma_sq = NULL,
alpha_ov = 0.05,
B = 1e+06,
min_select = 1000,
max_attempts = 100
)Arguments
- n
Sample size (number of observations).
- p
Number of predictors used in the "overall" least squares linear model (excluding the intercept).
- R_squared
R-squared from the "overall" fitted least squares linear model (e.g. from summary(lm(y~X))).
- RSE
Residual standard error from the "overall" fitted least squares linear model (e.g. from summary(lm(y~X))).
- tstat
Observed t-statistic for the post hoc hypothesis test of beta_j.
- sigma_sq
Optional estimate of the noise variance. If NULL, uses debiased estimate that accounts for selection.
- alpha_ov
Significance level for the overall F-test. Default is 0.05.
- B
Number of Monte Carlo samples per iteration. Default is 1,000,000.
- min_select
Minimum number of samples satisfying the selection condition. Default is 1,000.
- max_attempts
Maximum number of iterations for passing selection criterion before giving up. Default is 100.
Value
A numeric value representing the estimated selective p-value. If no selected samples are obtained after max_attempts, the function returns NA and issues a warning.
Examples
data(mtcars)
mod <- lm(mpg ~ wt + hp, data = mtcars)
rse <- summary(mod)$sigma
r2 <- summary(mod)$r.squared
t_hp <- summary(mod)$coefficients["hp", "t value"]
psel_retro(n=nrow(mtcars), p=2, R_squared=r2, RSE=rse, tstat=t_hp)
#> [1] 0.001473
result <- lmFScreen(mpg ~ wt + hp, data = mtcars)
result[["selective pvalues"]][2]
#> hp
#> 0.001377
# the retrospective and prospective p-values coincide (up to Monte Carlo error)