lmFScreen: Valid F-screening
lmFScreen.RdThis function takes as input a design matrix X and an output vector y and fits a least squares linear regression model. If an intercept is present in the model, the data are projected to remove the intercept before conducting inference. It then conducts F-screening (via the function lmFScreen.fit) as defined in the 2025 paper "Valid F-screening in linear regression" by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113) as follows:
First, it tests the overall hypothesis that all coefficients (excluding the intercept) in the linear regression are zero using an F-test.
If (and only if) this overall test is rejected, it outputs selective p-values, confidence intervals, and point estimates for the coefficients in the linear regression model that condition on the rejection of the overall F-test. If the overall test is not not rejected, this function returns the overall F-statistic and p-value and indicates that it is not significant.
Usage
lmFScreen(
formula,
data,
alpha = 0.05,
alpha_ov = 0.05,
test_cols = 1:ncol(X),
sigma_sq = NULL,
compute_CI = TRUE,
compute_est = TRUE,
B = 1e+05
)Arguments
- formula
A formula specifying the linear model (e.g., y ~ x1 + x2).
- data
An optional data frame containing the variables in the model.
- alpha
Significance level for confidence intervals and hypothesis tests (default: 0.05).
- alpha_ov
Significance level for the overall F-test used for screening (default: 0.05).
- test_cols
Indices of predictors to test (default: all columns of X).
- sigma_sq
Optional noise variance. If NULL, it is estimated using a corrected residual variance.
- compute_CI
Logical; whether to compute selective confidence intervals (default: TRUE).
- compute_est
Logical; whether to compute selective point estimates (default: TRUE).
- B
Number of Monte Carlo samples used for selective inference (default: 100000).
Value
An object of class lmFScreen, which includes:
Selective coefficients, confidence intervals, and p-values
Standard (unadjusted) estimates, confidence intervals, and p-values
Model-level settings such as alpha and alpha_ov
Details
This function performs the following steps:
Converts the formula into a model matrix and response vector.
Projects out the intercept if one is included in the formula.
Calls lmFScreen.fit to compute selective inference results for all predictors in test_cols.
Examples
# EXAMPLE 1
data(mtcars)
result <- lmFScreen(mpg ~ wt + hp, data = mtcars)
summary(result)
#> lmFScreen Model Summary
#> --------------------------------------
#> Overall F-statistic: 69.2112
#> --------------------------------------
#>
#> Number of post hoc tests: 2
#> --------------------------------------
#>
#> Selective Estimates:
#> Predictor Estimate Lower.CI Upper.CI P-value
#> -------------------------------------------------------------
#> wt -3.877831 -5.1727 -2.5851 0.0000***
#> hp -0.031773 -0.0502 -0.0133 0.0014**
#>
#> Standard Estimates:
#> Predictor Estimate Lower.CI Upper.CI P-value
#> -------------------------------------------------------------
#> wt -3.877831 -5.1180 -2.6377 0.0000***
#> hp -0.031773 -0.0495 -0.0141 0.0015**
#>
#>
#> Significance levels: * < 0.05 ** < 0.01 *** < 0.001
coef(result)
#>
#> Coefficients from lmFScreen
#> ----------------------------
#> Predictor Selective.Est Naive.Est
#> 1 wt -3.87783074 -3.87783074
#> 2 hp -0.03177295 -0.03177295
confint(result)
#>
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 2
#> ------------------------------------------------------
#>
#> Selective Confidence Intervals:
#> Predictor Lower.CI Upper.CI
#> ---------------------------------------------
#> wt -5.1727 -2.5851
#> hp -0.0502 -0.0133
#>
#> Standard Confidence Intervals:
#> Predictor Lower.CI Upper.CI
#> ---------------------------------------------
#> wt -5.1180 -2.6377
#> hp -0.0495 -0.0141
#>
# in example 1 the overall F-test has a p-value close to zero, so there is essentially no need to account for selection
# EXAMPLE 2
set.seed(50)
X <- matrix(rnorm(100), ncol = 5)
y <- rnorm(20)
result <- lmFScreen(y ~ X)
#> Did not pass Fscreening!
#> overall F-statistic: 0.8874302 on 5 and 14 degrees of freedom
#> p-value: 0.6090876
# in example 2, the overall F-test is not rejected
# EXAMPLE 3
set.seed(100)
X <- matrix(rnorm(100), ncol = 5)
beta <- c(.5,.4,.3,.2,.1)
y <- X %*% beta + rnorm(20)
result <- lmFScreen(y ~ X)
summary(result)
#> lmFScreen Model Summary
#> --------------------------------------
#> Overall F-statistic: 4.2599
#> --------------------------------------
#>
#> Number of post hoc tests: 5
#> --------------------------------------
#>
#> Selective Estimates:
#> Predictor Estimate Lower.CI Upper.CI P-value
#> -------------------------------------------------------------
#> X1 0.415839 -0.2833 1.3457 0.2362
#> X2 0.532707 -0.1291 1.4702 0.1178
#> X3 0.188660 -0.2351 0.7115 0.3690
#> X4 0.034547 -0.5122 0.6010 0.8900
#> X5 0.133741 -0.3295 0.6239 0.5835
#>
#> Standard Estimates:
#> Predictor Estimate Lower.CI Upper.CI P-value
#> -------------------------------------------------------------
#> X1 0.629408 -0.0184 1.2772 0.0776
#> X2 0.823721 0.2247 1.4228 0.0174*
#> X3 0.277856 -0.1075 0.6633 0.1795
#> X4 0.050258 -0.4140 0.5145 0.8350
#> X5 0.170696 -0.2236 0.5650 0.4104
#>
#>
#> Significance levels: * < 0.05 ** < 0.01 *** < 0.001
coef(result)
#>
#> Coefficients from lmFScreen
#> ----------------------------
#> Predictor Selective.Est Naive.Est
#> 1 X1 0.41583884 0.62940775
#> 2 X2 0.53270730 0.82372146
#> 3 X3 0.18866028 0.27785611
#> 4 X4 0.03454731 0.05025831
#> 5 X5 0.13374114 0.17069581
confint(result)
#>
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 5
#> ------------------------------------------------------
#>
#> Selective Confidence Intervals:
#> Predictor Lower.CI Upper.CI
#> ---------------------------------------------
#> X1 -0.2833 1.3457
#> X2 -0.1291 1.4702
#> X3 -0.2351 0.7115
#> X4 -0.5122 0.6010
#> X5 -0.3295 0.6239
#>
#> Standard Confidence Intervals:
#> Predictor Lower.CI Upper.CI
#> ---------------------------------------------
#> X1 -0.0184 1.2772
#> X2 0.2247 1.4228
#> X3 -0.1075 0.6633
#> X4 -0.4140 0.5145
#> X5 -0.2236 0.5650
#>
# in Example 3, the selective p-values change significantly from the standard p-values