Skip to contents

This function takes as input a design matrix X and output vector y and fits a linear regression model (without an intercept – X and y should be centered). It then conducts F-screening as defined in "Valid F-screening in linear regression" by

  1. testing the overall hypothesis that all coefficients in the linear regression are zero using an F-test, and

  2. if this overall test is rejected, it outputs selective p-values, confidence intervals, and point estimates for the coefficients in the linear regression model that condition on the rejection of the overall F-test. If the overall test is not rejected, it returns the overall F-statistic and indicates that it is not significant.

Usage

lmFScreen.fit(
  X,
  y,
  alpha = 0.05,
  alpha_ov = 0.05,
  test_cols = 1:ncol(X),
  compute_CI = TRUE,
  compute_est = TRUE,
  B = 10000
)

Arguments

X

A numeric matrix of predictors.

y

A numeric response vector.

alpha

Significance level for confidence intervals and hypothesis tests (default: 0.05).

alpha_ov

Significance level for the overall F-test used for screening (default: 0.05).

test_cols

Indices of predictors to test (default: all columns of X).

compute_CI

Logical; whether to compute selective confidence intervals (default: TRUE).

compute_est

Logical; whether to compute selective point estimates (default: TRUE).

B

Number of Monte Carlo samples used for selective inference (default: 100000).

Value

A list of class lmFScreen containing:

  • Selective coefficients, confidence intervals, and p-values

  • Standard (OLS) coefficients, confidence intervals, and p-values

  • Model settings: alpha and alpha_ov

Examples

data(mtcars)
X <- cbind(mtcars$wt, mtcars$hp)
y <- mtcars$mpg
svdP <- svd(rep(1,nrow(mtcars)), nu = nrow(mtcars))
tol <- nrow(mtcars) * max(svdP$d) * .Machine$double.eps
r <- sum(svdP$d > tol)
U_full <- svdP$u
U_perp <- U_full[, (r+1):ncol(U_full)]
X <- t(U_perp) %*% X
y <- t(U_perp) %*% y
result <- lmFScreen:::lmFScreen.fit(X,y)
summary(result)
#> lmFScreen Model Summary 
#> --------------------------------------
#> Overall F-statistic:    69.2112
#> --------------------------------------
#> 
#> Number of post hoc tests: 2
#> --------------------------------------
#> 
#> Selective Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  X1            -3.878217     -5.1687     -2.5450      0.0001***
#>  X2            -0.031778     -0.0503     -0.0134      0.0011**
#> 
#> Standard Estimates:
#> Predictor       Estimate     Lower.CI    Upper.CI    P-value
#> -------------------------------------------------------------
#>  X1            -3.877831     -5.1180     -2.6377      0.0000***
#>  X2            -0.031773     -0.0495     -0.0141      0.0015**
#> 
#> 
#> Significance levels: * < 0.05  ** < 0.01  *** < 0.001
coef(result)
#> 
#> Coefficients from lmFScreen
#> ----------------------------
#>   Predictor Selective.Est   Naive.Est
#> 1        X1   -3.87821657 -3.87783074
#> 2        X2   -0.03177807 -0.03177295
confint(result)
#> 
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 2
#> ------------------------------------------------------
#> 
#> Selective Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  X1                -5.1687       -2.5450
#>  X2                -0.0503       -0.0134
#> 
#> Standard Confidence Intervals:
#> Predictor       Lower.CI     Upper.CI
#> ---------------------------------------------
#>  X1                -5.1180       -2.6377
#>  X2                -0.0495       -0.0141
#>