lmFScreen: Valid F-screening — lmFScreen • lmFScreen

This function takes as input a design matrix X and an output vector y and fits a least squares linear regression model. If an intercept is present in the model, the data are projected to remove the intercept before conducting inference. It then conducts F-screening (via the function lmFScreen.fit) as defined in the 2025 paper "Valid F-screening in linear regression" by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113) as follows:

First, it tests the overall hypothesis that all coefficients (excluding the intercept) in the linear regression are zero using an F-test.
If (and only if) this overall test is rejected, it outputs selective p-values, confidence intervals, and point estimates for the coefficients in the linear regression model that condition on the rejection of the overall F-test. If the overall test is not not rejected, this function returns the overall F-statistic and p-value and indicates that it is not significant.

Usage

lmFScreen(
  formula,
  data,
  alpha = 0.05,
  alpha_ov = 0.05,
  test_cols = 1:ncol(X),
  sigma_sq = NULL,
  compute_CI = TRUE,
  compute_est = TRUE,
  B = 1e+05
)

Arguments

formula: A formula specifying the linear model (e.g., y ~ x1 + x2).
data: An optional data frame containing the variables in the model.
alpha: Significance level for confidence intervals and hypothesis tests (default: 0.05).
alpha_ov: Significance level for the overall F-test used for screening (default: 0.05).
test_cols: Indices of predictors to test (default: all columns of X).
sigma_sq: Optional noise variance. If NULL, it is estimated using a corrected residual variance.
compute_CI: Logical; whether to compute selective confidence intervals (default: TRUE).
compute_est: Logical; whether to compute selective point estimates (default: TRUE).
B: Number of Monte Carlo samples used for selective inference (default: 100000).

Value

An object of class lmFScreen, which includes:

Selective coefficients, confidence intervals, and p-values
Standard (unadjusted) estimates, confidence intervals, and p-values
Model-level settings such as alpha and alpha_ov

Details

This function performs the following steps:

Converts the formula into a model matrix and response vector.
Projects out the intercept if one is included in the formula.
Calls lmFScreen.fit to compute selective inference results for all predictors in test_cols.

Examples


# EXAMPLE 1
data(mtcars)
result <- lmFScreen(mpg ~ wt + hp, data = mtcars)
summary(result)
#> lmFScreen Model Summary 
#> --------------------------------------
#> Overall F-statistic:    69.2112
#> --------------------------------------
#> 
#> Number of post hoc tests: 2
#> --------------------------------------
#> 
#> Selective Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  wt            -3.877831     -5.1727     -2.5851      0.0000***
#>  hp            -0.031773     -0.0502     -0.0133      0.0014**
#> 
#> Standard Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  wt            -3.877831     -5.1180     -2.6377      0.0000***
#>  hp            -0.031773     -0.0495     -0.0141      0.0015**
#> 
#> 
#> Significance levels: * < 0.05  ** < 0.01  *** < 0.001
coef(result)
#> 
#> Coefficients from lmFScreen
#> ----------------------------
#>   Predictor Selective.Est   Naive.Est
#> 1        wt   -3.87783074 -3.87783074
#> 2        hp   -0.03177295 -0.03177295
confint(result)
#> 
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 2
#> ------------------------------------------------------
#> 
#> Selective Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  wt                -5.1727       -2.5851
#>  hp                -0.0502       -0.0133
#> 
#> Standard Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  wt                -5.1180       -2.6377
#>  hp                -0.0495       -0.0141
#> 
# in example 1 the overall F-test has a p-value close to zero, so there is essentially no need to account for selection

# EXAMPLE 2
set.seed(50)
X <- matrix(rnorm(100), ncol = 5)
y <- rnorm(20)
result <- lmFScreen(y ~ X)
#> Did not pass Fscreening!
#> overall F-statistic:  0.8874302  on  5  and  14  degrees of freedom
#> p-value:  0.6090876 
# in example 2, the overall F-test is not rejected

# EXAMPLE 3
set.seed(100)
X <- matrix(rnorm(100), ncol = 5)
beta <- c(.5,.4,.3,.2,.1)
y <- X %*% beta + rnorm(20)
result <- lmFScreen(y ~ X)
summary(result)
#> lmFScreen Model Summary 
#> --------------------------------------
#> Overall F-statistic:     4.2599
#> --------------------------------------
#> 
#> Number of post hoc tests: 5
#> --------------------------------------
#> 
#> Selective Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  X1             0.415839     -0.2833      1.3457      0.2362
#>  X2             0.532707     -0.1291      1.4702      0.1178
#>  X3             0.188660     -0.2351      0.7115      0.3690
#>  X4             0.034547     -0.5122      0.6010      0.8900
#>  X5             0.133741     -0.3295      0.6239      0.5835
#> 
#> Standard Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  X1             0.629408     -0.0184      1.2772      0.0776
#>  X2             0.823721      0.2247      1.4228      0.0174*
#>  X3             0.277856     -0.1075      0.6633      0.1795
#>  X4             0.050258     -0.4140      0.5145      0.8350
#>  X5             0.170696     -0.2236      0.5650      0.4104
#> 
#> 
#> Significance levels: * < 0.05  ** < 0.01  *** < 0.001
coef(result)
#> 
#> Coefficients from lmFScreen
#> ----------------------------
#>   Predictor Selective.Est  Naive.Est
#> 1        X1    0.41583884 0.62940775
#> 2        X2    0.53270730 0.82372146
#> 3        X3    0.18866028 0.27785611
#> 4        X4    0.03454731 0.05025831
#> 5        X5    0.13374114 0.17069581
confint(result)
#> 
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 5
#> ------------------------------------------------------
#> 
#> Selective Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  X1                -0.2833        1.3457
#>  X2                -0.1291        1.4702
#>  X3                -0.2351        0.7115
#>  X4                -0.5122        0.6010
#>  X5                -0.3295        0.6239
#> 
#> Standard Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  X1                -0.0184        1.2772
#>  X2                 0.2247        1.4228
#>  X3                -0.1075        0.6633
#>  X4                -0.4140        0.5145
#>  X5                -0.2236        0.5650
#> 
# in Example 3, the selective p-values change significantly from the standard p-values