Skip to contents

This function takes as input a design matrix X and an output vector y and fits a least squares linear regression model. If an intercept is present in the model, the data are projected to remove the intercept before conducting inference. It then conducts F-screening (via the function lmFScreen.fit) as defined in the 2025 paper "Valid F-screening in linear regression" by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113) as follows:

  1. First, it tests the overall hypothesis that all coefficients (excluding the intercept) in the linear regression are zero using an F-test.

  2. If (and only if) this overall test is rejected, it outputs selective p-values, confidence intervals, and point estimates for the coefficients in the linear regression model that condition on the rejection of the overall F-test. If the overall test is not not rejected, this function returns the overall F-statistic and p-value and indicates that it is not significant.

Usage

lmFScreen(
  formula,
  data,
  alpha = 0.05,
  alpha_ov = 0.05,
  test_cols = 1:ncol(X),
  compute_CI = TRUE,
  compute_est = TRUE,
  B = 1e+05
)

Arguments

formula

A formula specifying the linear model (e.g., y ~ x1 + x2).

data

An optional data frame containing the variables in the model.

alpha

Significance level for confidence intervals and hypothesis tests (default: 0.05).

alpha_ov

Significance level for the overall F-test used for screening (default: 0.05).

test_cols

Indices of predictors to test (default: all columns of X).

compute_CI

Logical; whether to compute selective confidence intervals (default: TRUE).

compute_est

Logical; whether to compute selective point estimates (default: TRUE).

B

Number of Monte Carlo samples used for selective inference (default: 100000).

Value

An object of class lmFScreen, which includes:

  • Selective coefficients, confidence intervals, and p-values

  • Standard (unadjusted) estimates, confidence intervals, and p-values

  • Model-level settings such as alpha and alpha_ov

Details

This function performs the following steps:

  1. Converts the formula into a model matrix and response vector.

  2. Projects out the intercept if one is included in the formula.

  3. Calls lmFScreen.fit to compute selective inference results for all predictors in test_cols.

Examples


# EXAMPLE 1
data(mtcars)
result <- lmFScreen(mpg ~ wt + hp, data = mtcars)
summary(result)
#> lmFScreen Model Summary 
#> --------------------------------------
#> Overall F-statistic:    69.2112
#> --------------------------------------
#> 
#> Number of post hoc tests: 2
#> --------------------------------------
#> 
#> Selective Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  wt            -3.878217     -5.1699     -2.5830      0.0000***
#>  hp            -0.031778     -0.0503     -0.0133      0.0014**
#> 
#> Standard Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  wt            -3.877831     -5.1180     -2.6377      0.0000***
#>  hp            -0.031773     -0.0495     -0.0141      0.0015**
#> 
#> 
#> Significance levels: * < 0.05  ** < 0.01  *** < 0.001
coef(result)
#> 
#> Coefficients from lmFScreen
#> ----------------------------
#>   Predictor Selective.Est   Naive.Est
#> 1        wt   -3.87821657 -3.87783074
#> 2        hp   -0.03177807 -0.03177295
confint(result)
#> 
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 2
#> ------------------------------------------------------
#> 
#> Selective Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  wt                -5.1699       -2.5830
#>  hp                -0.0503       -0.0133
#> 
#> Standard Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  wt                -5.1180       -2.6377
#>  hp                -0.0495       -0.0141
#> 
# in example 1 the overall F-test has a p-value close to zero, so there is essentially no need to account for selection

# EXAMPLE 2
set.seed(50)
X <- matrix(rnorm(100), ncol = 5)
y <- rnorm(20)
result <- lmFScreen(y ~ X)
#> Did not pass Fscreening!
#> overall F-statistic:  0.8874302  on  5  and  14  degrees of freedom
#> p-value:  0.6090876 
# in example 2, the overall F-test is not rejected

# EXAMPLE 3
set.seed(100)
X <- matrix(rnorm(100), ncol = 5)
beta <- c(.5,.4,.3,.2,.1)
y <- X %*% beta + rnorm(20)
result <- lmFScreen(y ~ X)
summary(result)
#> lmFScreen Model Summary 
#> --------------------------------------
#> Overall F-statistic:     4.2599
#> --------------------------------------
#> 
#> Number of post hoc tests: 5
#> --------------------------------------
#> 
#> Selective Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  X1             0.586498     -0.3326      1.3361      0.1401
#>  X2             0.761575     -0.2647      1.4804      0.1971
#>  X3             0.266320     -0.2108      0.6987      0.1802
#>  X4             0.047620     -0.4588      0.5580      0.8351
#>  X5             0.162720     -0.2598      0.6015      0.4110
#> 
#> Standard Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  X1             0.629408     -0.0184      1.2772      0.0776
#>  X2             0.823721      0.2247      1.4228      0.0174*
#>  X3             0.277856     -0.1075      0.6633      0.1795
#>  X4             0.050258     -0.4140      0.5145      0.8350
#>  X5             0.170696     -0.2236      0.5650      0.4104
#> 
#> 
#> Significance levels: * < 0.05  ** < 0.01  *** < 0.001
coef(result)
#> 
#> Coefficients from lmFScreen
#> ----------------------------
#>   Predictor Selective.Est  Naive.Est
#> 1        X1    0.58649815 0.62940775
#> 2        X2    0.76157451 0.82372146
#> 3        X3    0.26631993 0.27785611
#> 4        X4    0.04762001 0.05025831
#> 5        X5    0.16272036 0.17069581
confint(result)
#> 
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 5
#> ------------------------------------------------------
#> 
#> Selective Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  X1                -0.3326        1.3361
#>  X2                -0.2647        1.4804
#>  X3                -0.2108        0.6987
#>  X4                -0.4588        0.5580
#>  X5                -0.2598        0.6015
#> 
#> Standard Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  X1                -0.0184        1.2772
#>  X2                 0.2247        1.4228
#>  X3                -0.1075        0.6633
#>  X4                -0.4140        0.5145
#>  X5                -0.2236        0.5650
#> 
# in Example 3, the selective p-values change significantly from the standard p-values