** If you are viewing this on GitHub, see to the following website for a more readable version **
What is F-screening?
Suppose that we have an -vector containing a quantitative response for observations and an design matrix containing covariates/features for each of observations. Our interest lies in the linear model .
A common data analysis pipeline, which we refer to as “F-screening,” is as follows:
- Test the “overall” null hypothesis, using an F-test.
- If (and only if) this test is rejected, conduct inference on for some coefficient .
Typically, Steps 1 and 2 are carried out using, e.g., the command lm(Y~X)
. (However, as will be explained in the next paragraph, this analysis pipeline is problematic.)
While F-screening is intuitive and widely used, carrying out Step 2 using a “standard” approach – for instance, testing using the t-test output by lm(Y~X)
– is problematic, as it does not account for the fact that we conduct inference on only if we rejected the “overall” null hypothesis in Step 1. Consequently, the standard t-test for will not control the Type 1 error, standard confidence intervals for will not attain the nominal coverage, and even the point estimate for output by lm(Y~X)
will be biased.
The lmFScreen
package provides a valid inferential toolbox for conducting Step 2 in the F-screening procedure, by accounting for the fact that we conduct inference on in Step 2 only if we reject the overall null hypothesis in Step 1. This inferential toolbox falls under the larger umbrella of “conditional selective inference”, where the “selection event” that we are conditioning on is rejection of the overall null hypothesis in Step 1.
This package is based off of the 2025 paper “Valid F-screening in linear regression” by McGough, Witten, and Kessler (arxiv preprint: https://arxiv.org/abs/2505.23113). See https://github.com/mcgougho/lmFScreen-paper for code to replicate figures in the paper.
Installation
To install the lmFScreen
package from GitHub, run the following in your R console: devtools::install_github("mcgougho/lmFScreen")
.
Prospective inference with lmFScreen
function
The lmFScreen
function enables valid inference conditional on rejection of an overall F-test in linear regression, using tools from conditional selective inference.
Given a design matrix X
and a response vector y
, the package:
- Conducts an F-test of the overall null hypothesis using
lm(y~X)
. - If and only if the overall test is rejected, conducts inference on specified coefficients (
test_cols
). - Returns (if the overall test is rejected):
- Selective p-values for
- Selective confidence intervals for
- Selective point estimates for
- Standard (unadjusted) counterparts for comparison arising from
lm(y~x)
If the overall test is not rejected, the function returns the overall F-statistic and p-value.
Unlike the standard output arising from lm(y~X)
, the selective p-values, confidence intervals, and points estimates are valid conditional on rejection of the overall null hypothesis.
A tutorial for how to use this function can be found on the FScreen website in the Using lmFScreen tab.
Retrospective inference with psel_retro
function
The psel_retro()
function allows for valid selective p-values in retrospective settings, where only the output of summary(lm(y~x))
(not the raw data x
and y
) are available. It requires:
- Sample size n
- Number of predictors p
- R-squared (i.e
summary(lm(y~x))$r.squared
) from the overall linear model - Residual standard error (RSE) (i.e. can be obtained by squaring
summary(lm(y~x))$sigma
) from the overall linear model - F-statistic for the coefficient of interest (i.e.
summary(lm(y~x))$coefficients[j, "t value"]
, wherej-1
is the index of the coefficient of interest)
This is particularly useful when analyzing published results (e.g., from an applied paper’s Table 1) where individual-level data is unavailable.
A tutorial for how to use this function can be found on the FScreen website in the Using psel_retro tab.