Skip to contents

This function is useful for conducting valid retrospective F-screening as defined in the 2025 paper "Valid F-screening in linear regression" by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113). Suppose that we have access to the outputs of an "overall" least squares linear regression model, such as from the output of summary(lm(y~X)), and we want to conduct a test of the significance of a single regression coefficient (beta_j) that accounts for the rejection of the "overall" F-test. Then this function can provide a selective p-value for beta_j based on of only a few summary statistics. The arguments of this function include R-squared and residual standard error (RSE) from the overall model (e.g. from summary(lm(y~X))), and a t-statistic for the test of H_0: beta_j=0. This function is especially useful in settings where the raw data is unavailable, such as published studies.

Usage

psel_retro(
  n,
  p,
  R_squared,
  RSE,
  tstat,
  sigma_sq = NULL,
  alpha_ov = 0.05,
  B = 1e+06,
  min_select = 1000,
  max_attempts = 100
)

Arguments

n

Sample size (number of observations).

p

Number of predictors used in the "overall" least squares linear model (excluding the intercept).

R_squared

R-squared from the "overall" fitted least squares linear model (e.g. from summary(lm(y~X))).

RSE

Residual standard error from the "overall" fitted least squares linear model (e.g. from summary(lm(y~X))).

tstat

Observed t-statistic for the post hoc hypothesis test of beta_j.

sigma_sq

Optional estimate of the noise variance. If NULL, uses debiased estimate that accounts for selection.

alpha_ov

Significance level for the overall F-test. Default is 0.05.

B

Number of Monte Carlo samples per iteration. Default is 1,000,000.

min_select

Minimum number of samples satisfying the selection condition. Default is 1,000.

max_attempts

Maximum number of iterations for passing selection criterion before giving up. Default is 100.

Value

A numeric value representing the estimated selective p-value. If no selected samples are obtained after max_attempts, the function returns NA and issues a warning.

Examples

data(mtcars)
mod <- lm(mpg ~ wt + hp, data = mtcars)
rse <- summary(mod)$sigma
r2 <- summary(mod)$r.squared
t_hp <- summary(mod)$coefficients["hp", "t value"]
psel_retro(n=nrow(mtcars), p=2, R_squared=r2, RSE=rse, tstat=t_hp)
#> [1] 0.001473
result <- lmFScreen(mpg ~ wt + hp, data = mtcars)
result[["selective pvalues"]][2]
#>       hp 
#> 0.001377 
# the retrospective and prospective p-values coincide (up to Monte Carlo error)