Alpha_spending.Rd
Implements online FWER control using a Bonferroni-like test.
Alpha_spending(
d,
alpha = 0.05,
gammai,
random = TRUE,
date.format = "%Y-%m-%d"
)
Either a vector of p-values, or a dataframe with three columns: an identifier (`id'), date (`date') and p-value (`pval'). If no column of dates is provided, then the p-values are treated as being ordered in sequence, arriving one at a time.
Overall significance level of the FDR procedure, the default is 0.05.
Optional vector of \(\gamma_i\), where hypothesis \(i\) is rejected if the \(i\)-th p-value is less than or equal to \(\alpha \gamma_i\). A default is provided as proposed by Javanmard and Montanari (2018), equation 31.
Logical. If TRUE
(the default), then the order of the
p-values in each batch (i.e. those that have exactly the same date) is
randomised.
Optional string giving the format that is used for dates.
A dataframe with the original data d
(which will
be reordered if there are batches and random = TRUE
), the adjusted
signifcance thresholds alphai
and the indicator function of
discoveries R
, where R[i] = 1
corresponds to hypothesis
\(i\) being rejected (otherwise R[i] = 0
).
The function takes as its input either a vector of p-values, or a dataframe with three columns: an identifier (`id'), date (`date') and p-value (`pval'). The case where p-values arrive in batches corresponds to multiple instances of the same date. If no column of dates is provided, then the p-values are treated as being ordered in sequence, arriving one at a time.
Alpha-spending provides strong FWER control for a potentially infinite stream of p-values by using a Bonferroni-like test. Given an overall significance level \(\alpha\), we choose a (potentially infinite) sequence of non-negative numbers \(\gamma_i\) such that they sum to 1. Hypothesis \(i\) is rejected if the \(i\)-th p-value is less than or equal to \(\alpha \gamma_i\).
Note that the procedure controls the generalised familywise error rate (k-FWER) for \(k > 1\) if \(\alpha\) is replaced by min(\(1, k\alpha\)).
Javanmard, A. and Montanari, A. (2018) Online Rules for Control of False Discovery Rate and False Discovery Exceedance. Annals of Statistics, 46(2):526-554.
Tian, J. and Ramdas, A. (2021). Online control of the familywise error rate. Statistical Methods for Medical Research (to appear), https://arxiv.org/abs/1910.04900.
sample.df <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
date = as.Date(c(rep('2014-12-01',3),
rep('2015-09-21',5),
rep('2016-05-19',2),
'2016-11-12',
rep('2017-03-27',4))),
pval = c(2.90e-17, 0.06743, 0.01514, 0.08174, 0.00171,
3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
0.69274, 0.30443, 0.00136, 0.72342, 0.54757))
set.seed(1); Alpha_spending(sample.df)
#> id date pval alphai R
#> 1 A15432 2014-12-01 2.9000e-17 0.0026758385 1
#> 2 B90969 2014-12-01 6.7430e-02 0.0005819103 0
#> 3 C18705 2014-12-01 1.5140e-02 0.0004956249 0
#> 4 B49731 2015-09-21 8.1740e-02 0.0004121803 0
#> 5 E99902 2015-09-21 1.7100e-03 0.0003494435 0
#> 6 D46627 2015-09-21 2.7201e-01 0.0003022950 0
#> 7 C38292 2015-09-21 3.6000e-05 0.0002659722 1
#> 8 A30619 2015-09-21 7.9149e-01 0.0002372613 0
#> 9 A41418 2016-05-19 7.5900e-08 0.0002140474 1
#> 10 E29198 2016-05-19 2.8295e-01 0.0001949126 0
#> 11 D51456 2016-11-12 6.9274e-01 0.0001788796 0
#> 12 A63155 2017-03-27 7.2342e-01 0.0001652568 0
#> 13 C88669 2017-03-27 3.0443e-01 0.0001535420 0
#> 14 B66033 2017-03-27 5.4757e-01 0.0001433627 0
#> 15 E03673 2017-03-27 1.3600e-03 0.0001344368 0
Alpha_spending(sample.df, random=FALSE)
#> id date pval alphai R
#> 1 A15432 2014-12-01 2.9000e-17 0.0026758385 1
#> 2 B90969 2014-12-01 6.7430e-02 0.0005819103 0
#> 3 C18705 2014-12-01 1.5140e-02 0.0004956249 0
#> 4 B49731 2015-09-21 8.1740e-02 0.0004121803 0
#> 5 E99902 2015-09-21 1.7100e-03 0.0003494435 0
#> 6 C38292 2015-09-21 3.6000e-05 0.0003022950 1
#> 7 A30619 2015-09-21 7.9149e-01 0.0002659722 0
#> 8 D46627 2015-09-21 2.7201e-01 0.0002372613 0
#> 9 E29198 2016-05-19 2.8295e-01 0.0002140474 0
#> 10 A41418 2016-05-19 7.5900e-08 0.0001949126 1
#> 11 D51456 2016-11-12 6.9274e-01 0.0001788796 0
#> 12 C88669 2017-03-27 3.0443e-01 0.0001652568 0
#> 13 E03673 2017-03-27 1.3600e-03 0.0001535420 0
#> 14 A63155 2017-03-27 7.2342e-01 0.0001433627 0
#> 15 B66033 2017-03-27 5.4757e-01 0.0001344368 0
set.seed(1); Alpha_spending(sample.df, alpha=0.1)
#> id date pval alphai R
#> 1 A15432 2014-12-01 2.9000e-17 0.0053516771 1
#> 2 B90969 2014-12-01 6.7430e-02 0.0011638206 0
#> 3 C18705 2014-12-01 1.5140e-02 0.0009912499 0
#> 4 B49731 2015-09-21 8.1740e-02 0.0008243606 0
#> 5 E99902 2015-09-21 1.7100e-03 0.0006988870 0
#> 6 D46627 2015-09-21 2.7201e-01 0.0006045900 0
#> 7 C38292 2015-09-21 3.6000e-05 0.0005319444 1
#> 8 A30619 2015-09-21 7.9149e-01 0.0004745225 0
#> 9 A41418 2016-05-19 7.5900e-08 0.0004280949 1
#> 10 E29198 2016-05-19 2.8295e-01 0.0003898252 0
#> 11 D51456 2016-11-12 6.9274e-01 0.0003577593 0
#> 12 A63155 2017-03-27 7.2342e-01 0.0003305137 0
#> 13 C88669 2017-03-27 3.0443e-01 0.0003070841 0
#> 14 B66033 2017-03-27 5.4757e-01 0.0002867254 0
#> 15 E03673 2017-03-27 1.3600e-03 0.0002688736 0