lmFScreen.fit: Valid F-screening
lmFScreen.fit.RdThis function takes as input a design matrix X and output vector y and fits a linear regression model (without an intercept – X and y should be centered). It then conducts F-screening as defined in "Valid F-screening in linear regression" by
testing the overall hypothesis that all coefficients in the linear regression are zero using an F-test, and
if this overall test is rejected, it outputs selective p-values, confidence intervals, and point estimates for the coefficients in the linear regression model that condition on the rejection of the overall F-test. If the overall test is not rejected, it returns the overall F-statistic and indicates that it is not significant.
Usage
lmFScreen.fit(
X,
y,
alpha = 0.05,
alpha_ov = 0.05,
test_cols = 1:ncol(X),
compute_CI = TRUE,
compute_est = TRUE,
B = 10000
)Arguments
- X
A numeric matrix of predictors.
- y
A numeric response vector.
- alpha
Significance level for confidence intervals and hypothesis tests (default: 0.05).
- alpha_ov
Significance level for the overall F-test used for screening (default: 0.05).
- test_cols
Indices of predictors to test (default: all columns of X).
- compute_CI
Logical; whether to compute selective confidence intervals (default: TRUE).
- compute_est
Logical; whether to compute selective point estimates (default: TRUE).
- B
Number of Monte Carlo samples used for selective inference (default: 100000).
Value
A list of class lmFScreen containing:
Selective coefficients, confidence intervals, and p-values
Standard (OLS) coefficients, confidence intervals, and p-values
Model settings: alpha and alpha_ov
Examples
data(mtcars)
X <- cbind(mtcars$wt, mtcars$hp)
y <- mtcars$mpg
svdP <- svd(rep(1,nrow(mtcars)), nu = nrow(mtcars))
tol <- nrow(mtcars) * max(svdP$d) * .Machine$double.eps
r <- sum(svdP$d > tol)
U_full <- svdP$u
U_perp <- U_full[, (r+1):ncol(U_full)]
X <- t(U_perp) %*% X
y <- t(U_perp) %*% y
result <- lmFScreen:::lmFScreen.fit(X,y)
summary(result)
#> lmFScreen Model Summary
#> --------------------------------------
#> Overall F-statistic: 69.2112
#> --------------------------------------
#>
#> Number of post hoc tests: 2
#> --------------------------------------
#>
#> Selective Estimates:
#> Predictor Estimate Lower.CI Upper.CI P-value
#> -------------------------------------------------------------
#> X1 -3.878217 -5.1687 -2.5450 0.0001***
#> X2 -0.031778 -0.0503 -0.0134 0.0011**
#>
#> Standard Estimates:
#> Predictor Estimate Lower.CI Upper.CI P-value
#> -------------------------------------------------------------
#> X1 -3.877831 -5.1180 -2.6377 0.0000***
#> X2 -0.031773 -0.0495 -0.0141 0.0015**
#>
#>
#> Significance levels: * < 0.05 ** < 0.01 *** < 0.001
coef(result)
#>
#> Coefficients from lmFScreen
#> ----------------------------
#> Predictor Selective.Est Naive.Est
#> 1 X1 -3.87821657 -3.87783074
#> 2 X2 -0.03177807 -0.03177295
confint(result)
#>
#> lmFScreen Model Confidence Intervals
#> ------------------------------------------------------
#> Confidence Level: 95.00%
#> Number of predictors: 2
#> ------------------------------------------------------
#>
#> Selective Confidence Intervals:
#> Predictor Lower.CI Upper.CI
#> ---------------------------------------------
#> X1 -5.1687 -2.5450
#> X2 -0.0503 -0.0134
#>
#> Standard Confidence Intervals:
#> Predictor Lower.CI Upper.CI
#> ---------------------------------------------
#> X1 -5.1180 -2.6377
#> X2 -0.0495 -0.0141
#>