12.3 Resampling Methods

How do we determine the variability of a regression fit? It can be done by drawing different samples from the available data, fitting a regression model to each sample, and then examining the extent to which the various fits differ from one another.

Resampling methods provide additional information about a fitted model, by applying the same fitting approach to various sub-samples of the training set \(\text{Tr}\). We will consider three such methods:

  • cross-validation, which is used to estimate the test error associated with a modeling approach in order to evaluate model performance;

  • the bootstrap, which is used to provide a measure of accuracy, standard deviation, bias, etc. of various model parameter estimates, and

  • the jackknife, which is a simpler approach with the same aims as the bootstrap.

 
For quantitative responses, the test error associated with a statistical learning model is the average error arising when predicting the response for observations that were not used to train the model.

The training error, on the other hand, is computed directly by comparing the model’s predictions to the actual responses in \(\text{Tr}\). In general, the training error underestimates the test error, dramatically so when the model complexity increases (i.e.the variance-bias trade-off, see Figure 12.6).

A possible solution to this conundrum is to set aside a large-enough testing set \(\text{Te}\), but that’s not always possible if the original dataset is not that large in the first place.205

In the statistical learning framework, we estimate the test error by holding a subset \(\text{Va}\subseteq \text{Tr}\) out from the fitting process (which takes place on \(\text{Tr}\setminus\text{Va}\)). The validation approach is a simple strategy that is used to estimate the test error associated with a particular statistical model on a set of observations.

Formally, the latter is split into a training set \(\text{Tr}\) and a validation set \(\text{Va}\) (the hold-out set). The model is fit on the training set; the fitted model is used to make predictions on the validation set. The resulting validation set error provides an estimate for the test error.

This approach is easy to implement and interpret, but it has a number of drawbacks, most importantly:

  • the validation error is highly dependent on the choice of the validation set, and is thus quite volatile;

  • the model is fitted on a proper subset of the available observations, and we might expect that this would leadd to the validation error being larger than the test error in general;

  • a number of classical statistical models can provide test error estimates without having to resort to the validation set approach.

12.3.1 Cross-Validation

\(K\)-Fold Cross Validation is a widely-used approach to estimate the test error without losing some observations to a hold-out test.206

The procedure is simple:

  1. Divide the dataset randomly into \(K\) (roughly) equal-sized folds (typically, \(K=4,5, 10\)).

  2. Each fold plays, in succession, the role of the validation set. If there are \(N\) observations in the dataset, partition \[\{1,\ldots, N\}=\underbrace{\mathcal{C}_1}_{\text{fold }1} \sqcup \cdots \sqcup \underbrace{\mathcal{C}_K}_{\text{fold }K}.\] If \(|\mathcal{C}_k|=n_k\), we expect \(n_k\approx \frac{N}{K}\) for all \(k=1,\ldots, K\).

  3. For all \(k=1,\ldots, K\), fit a model on \(\{1,\ldots,N\}\setminus \mathcal{C}_k\) and denote the error on \(\mathcal{C}_k\) by \(E_k\).207

  4. Write \(\overline{E}\) for the average of the \(E_k\).

  5. The cross-validation estimate of the test error is \[\text{CV}_{(K)}=\sum_{k=1}^K\frac{n_k}{N}E_k,\] with standard error \[\widehat{\text{se}}\left(\text{CV}_{(K)}\right)=\sqrt{\frac{1}{K-1}\sum_{k=1}^K(E_k-\overline{E})^2}.\]

 
These steps could also be replicated \(n\) times to generate a distribution of an evaluation metric (such as the standard error), see Figure 11.39 for an illustration.

Schematic illustration of cross-fold validation, for 8 replicates and 4 folds; 32 models from a given family are built on various training sets (consisting of 3/4 of the available data -- the training folds). Model family performance is evaluated on the respective holdout folds; the distribution of the performance metric values (in practice, some combination of the mean/median and standard deviation) can be used to compare various model families.

Figure 11.39: Schematic illustration of cross-fold validation, for 8 replicates and 4 folds; 32 models from a given family are built on various training sets (consisting of 3/4 of the available data – the training folds). Model family performance is evaluated on the respective holdout folds; the distribution of the performance metric values (in practice, some combination of the mean/median and standard deviation) can be used to compare various model families.

The resulting mean can prove useful in order to determine how well a statistical learning procedure will perform on unseen data. If, however, we are interested in selecting a method from a list of methods, or a flexibility level among a family of approaches, we do not care about the specific value of \(\text{CV}_{(K)}\) so much as where it is minimized.208

From the perspective of bias reduction (in the estimate for the test error), the best choice is \(K=N\), but this is mitigated by the variance-bias trade-off. With \(K=N\), we have \(N\) models and \(N\) estimates for the test error, but these estimates are highly correlated and the mean of highly correlated estimates has high variance (see Section 12.3.3 for details).

Gapminder Example

We use cross-validation in the Gapminder dataset to estimate the test error \(\text{MSE}_{\text{Te}}\) when predicting life expectancy as a regression against the logarithm of the GDP per capita for the 2011 data.

gapminder.2011.cv <- gapminder.2011 |> 
  dplyr::mutate(lgdppc = log(gdp/population)) |> 
  select(life_expectancy,lgdppc)

ggpubr::ggscatter(gapminder.2011.cv, x="lgdppc",  y="life_expectancy", palette="jco", 
                  size = 2, xlab="GDP per capita (log-scale)", ylab = "Life Expectancy", 
                  title = "Gapminder 2011 Data", xlim=c(0,12), ylim=c(0,85)) 

We split the dataset into \(K=10\) random folds, each containing 16 or 17 observations, and fit 10 linear regression models using the 149 or 150 remaining observations.209

The indices for each of the folds are computed below:

set.seed(0) # for replicability
true.order=sample.int(nrow(gapminder.2011.cv),
                      nrow(gapminder.2011.cv),replace=FALSE)

index=list()
for(k in 1:6){
  index[[k]] = true.order[((k-1)*17+1):(k*17)]
}
for(k in 7:10){
  index[[k]] = true.order[(102+(k-6-1)*16+1):(k*16+6)]
}

Each fold is used, in turn, as a testing set while the remaining folds form the training set. We fit an OLS model on each training set, and evaluate the MSE performance of the model on the appropriate fold testing set.

training.gap = list()
testing.gap = list()
model.lm.gap = list()
pred.lm.gap = list()
beta_0 = c()
beta_1 = c()
MSE.cv = c()
n.row = c()

for(k in 1:10){
  n.row[k]=length(index[[k]])
  training.gap[[k]] = gapminder.2011.cv[-index[[k]],]
  testing.gap[[k]] = gapminder.2011.cv[index[[k]],]
  model.lm.gap[[k]] = lm(life_expectancy~lgdppc, data=training.gap[[k]])
  beta_0[k] = model.lm.gap[[k]][[1]][1]
  beta_1[k] = model.lm.gap[[k]][[1]][2]
  pred.lm.gap[[k]] = predict(model.lm.gap[[k]], newdata=testing.gap[[k]])
  tmp=data.frame(pred.lm.gap[[k]],testing.gap[[k]][1])
  MSE.cv[k]=1/nrow(tmp)*sum((tmp[,1]-tmp[,2])^2)
}

The number of observations in each fold, as well as the regression parameters and the MSE on each fold testing set are shown below:

results = data.frame(n.row,beta_0,beta_1,MSE.cv)
n.row beta_0 beta_1 MSE.cv
17 37.69812 4.254105 33.859051
17 36.22257 4.442061 21.415376
17 37.59386 4.247255 45.620933
17 36.66761 4.345484 29.584469
17 37.49685 4.268917 24.127300
17 36.49849 4.386124 19.398769
16 36.78991 4.380887 48.157391
16 36.91113 4.331365 23.142625
16 37.41767 4.274771 7.837172
16 37.68955 4.254600 19.743046

The \(10-\)fold cross-validation estimate of \(\text{MSE}_{\text{Te}}\) is thus \[\begin{aligned} \overline{\text{MSE}_{\text{Te}}}&=\frac{1}{10}\sum_{k=1}^{10}\text{MSE}_{\text{Te}_k}= 27.29\\ \text{CV}_{(K)}&=\sum_{k=1}^{10}\frac{n_k}{166}\text{MSE}_{\text{Te}_k}=27.35\\ \widehat{\text{se}}\left(\text{CV}_{(K)}\right)&=\sqrt{\frac{1}{10-1}\sum_{k=1}^{10}(\text{MSE}_{\text{Te}_k}-\overline{\text{MSE}_\text{Te}})^2}=12.38, \end{aligned}\] as can be seen below.

(mean.MSE=mean(results$MSE.cv))
(cv.k=sum(results$n.row*results$MSE.cv/sum(results$n.row)))
(se.cv.k=sqrt(1/(nrow(results)-1)*sum((results$MSE.cv-mean.MSE)^2)))
[1] 27.28861
[1] 27.35051
[1] 12.37939

Thus, \(27.35 \pm 2(12.38)\equiv (2.59,52.11)\) is a 95% \(\text{C.I.}\) for the test error

We can also get \(10-\)fold cross-validation estimates of the true \(\beta_0\) and \(\beta_1\): we have \[\begin{aligned} \beta_{0_{(K)}}&=\sum_{k=1}^{10}\frac{n_k}{166}\beta_{0;k}= 37.10\\ \widehat{\text{se}}\left(\beta_{0_{(K)}}\right)&=\sqrt{\frac{1}{10-1}\sum_{k=1}^{10}(\beta_{0;k}-\overline{\beta_0})^2}=0.54, \end{aligned},\] with \(37.10 \pm 2(0.54) \equiv (36.00,38.18)\) as a 95% \(\text{C.I.}\) for \(\beta_0\); \[\begin{aligned}\beta_{1_{(K)}}&=\sum_{k=1}^{10}\frac{n_k}{166}\beta_{1;k}= 4.32\\ \widehat{\text{se}}\left(\beta_{1_{(K)}}\right)&=\sqrt{\frac{1}{10-1}\sum_{k=1}^{10}(\beta_{1;k}-\overline{\beta_1})^2}= 0.07, \end{aligned}\] with \(4.32 \pm 2(0.07)\equiv (4.18,4.56)\) as a 95% \(\text{C.I.}\) for \(\beta_1\), as is seen below:

mean.beta_0=mean(results$beta_0)
cv.beta_0.k=sum(results$n.row*results$beta_0/sum(results$n.row))
se.cv.beta_0.k=sqrt(1/(nrow(results)-1)*sum((results$beta_0-mean.beta_0)^2))

mean.beta_1=mean(results$beta_1)
cv.beta_1.k=sum(results$n.row*results$beta_1/sum(results$n.row))
se.cv.beta_1.k=sqrt(1/(nrow(results)-1)*sum((results$beta_1-mean.beta_1)^2))

LASSO and Regression Ridge Revisited

How would we pick the optimal hyperparameter \(\lambda\) in shrinkage regressions? Let us revisit the example from Section 12.2.4.

As before, we are interested in modeling life expectancy \(Y\) in the 2011 Gapminder dataset as a function of population, infant mortality, fertility, gdp, and continental membership.

We again create dummy variables for the continents, and select the appropriate variables for the response and the training set (which will be centered and scaled):

gapminder.2011.f <- fastDummies::dummy_cols(gapminder.2011, select_columns = 'continent')
y <- gapminder.2011.f |> select(life_expectancy) |> as.matrix()
x <- gapminder.2011.f |> select(c("population","infant_mortality","fertility","gdp",
      "continent_Africa","continent_Americas","continent_Asia",
      "continent_Europe","continent_Oceania")) |> 
      scale(center = TRUE, scale = TRUE) |> as.matrix()

We run a 5-fold cross-validation LASSO regression for a variety of hyperparameter values \(\lambda\), and evaluate the CV test error for each \(\lambda\) using \(\text{MSE}\). The optimal \(\lambda\) is the one that minimizes the CV test error.

Let us start with the LASSO (alpha=1)

glmnet1<-glmnet::cv.glmnet(x=x,y=y, type.measure='mse',nfolds=5,alpha=1)
(c1<-coef(glmnet1,s='lambda.min',exact=TRUE))
10 x 1 sparse Matrix of class "dgCMatrix"
                           s1
(Intercept)        70.8234940
population          .        
infant_mortality   -5.7375945
fertility           .        
gdp                 0.1616446
continent_Africa   -1.7592037
continent_Americas  .        
continent_Asia      .        
continent_Europe    0.1219114
continent_Oceania  -0.7977736

The optimal \(\lambda\) in this case is

(lambda1 = glmnet1$lambda.min)
[1] 0.3118295

We repeat the process for RR (alpha = 0):

glmnet0<-glmnet::cv.glmnet(x=x,y=y, type.measure='mse',nfolds=5,alpha=0)
(c0<-coef(glmnet0,s='lambda.min',exact=TRUE))
10 x 1 sparse Matrix of class "dgCMatrix"
                           s1
(Intercept)        70.8234940
population         -0.3466483
infant_mortality   -4.6968992
fertility          -0.4240814
gdp                 0.5813385
continent_Africa   -1.6192452
continent_Americas  0.5467797
continent_Asia      0.6295896
continent_Europe    1.0091460
continent_Oceania  -0.7207190

The optimal \(\lambda\) in this case is

(lambda0 = glmnet0$lambda.min)
[1] 0.7373175

How do these compare with the results from the preceding example? Do not hesitate to consult the glmnet() help file to learn more about shrinkage methods in R.

Cross-Validation with Python

Let us take a look at how we could estimate the test error via cross-validation manually in Python.

The following modules will be necessary: statsmodels to run linear models (in particular to define formulas for linear regression), numpy for numerical operations, and pandas for data frame manipulations.

import statsmodels.formula.api as smf 
import numpy as np
import pandas as pd
import random
random.seed(0) # for replicability

We use the calculus.csv dataset from Section 1.6.1, whose structure is as shown below. We will try to predict students’ grades in terms of the other predictors, using linear regression. In particular, we are interested in which model does a better job of predicting the grades.

df = pd.read_csv('data/calculus.csv')
df.head()
      ID Sex  Grade   GPA  Year
0  10001   F     47  5.02     2
1  10002   M     57  3.82     1
2  10003   M     91  7.70     1
3  10004   M     71  4.82     1
4  10005   F     83  7.91     1

We start by obtaining a random permutation of the observations (the pandas method iloc() selects values for specified indices).

nrows = len(df)
permuted = df.iloc[np.random.permutation(nrows)]
permuted.head()
       ID Sex  Grade    GPA  Year
60  10061   F     97  11.45     2
61  10062   M     70   3.65     1
28  10029   M     98  11.90     1
49  10050   F     92  11.05     1
50  10051   M     79   6.87     2

In this example, we separate the sample indices into \(k=5\) folds for cross-validation using the numpy function array_split().

k = 5
chunks = np.array_split(range(nrows), k) 
print(chunks)
[array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19]), array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
       37, 38, 39]), array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
       57, 58, 59]), array([60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
       77, 78, 79]), array([80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
       97, 98, 99])]

We iterate over each fold as a test set while using the remaining folds as a training set.

Say chunk[i] is the current test set; we can obtain the corresponding training set as follows (this block of code is not compiled):

training = permuted.iloc[ np.concatenate( [ chunks[j] for j in range(k) if j != i]) ]

We then perform a linear regression over this training set (with the statsmodels methods ols() and fit()) and compute the MSE over the test set using the predicted values. Remember, this is for a single fold (this block of code is not compiled):

fit = smf.ols(formula=m, data = training).fit()
test = permuted.iloc[chunks[i]]
pred = fit.predict( test )
testerror = ((pred - test['Grade'])**2).mean()

In the chunk of code above, formula=m is an R-style formula. In the following, we go through a number of possible formulas, for all folds.

f = ['Grade ~ GPA + C(Year) + C(Sex)', 'Grade ~ GPA + C(Year)', 'Grade ~ GPA + C(Sex)', 'Grade ~ GPA' ]

for m in f:
    testerror = 0.0
    for i in range(k):
        training = permuted.iloc[ np.concatenate( [ chunks[j] for j in range(k) if j != i]) ]
        fit = smf.ols(formula=m, data = training).fit()
        test = permuted.iloc[chunks[i]]
        pred = fit.predict( test )
        testerror += ((pred - test['Grade'])**2).mean()
    testerror /= k
    print(testerror, m)
118.2165188650409 Grade ~ GPA + C(Year) + C(Sex)
117.4815061224269 Grade ~ GPA + C(Year)
115.77980266850878 Grade ~ GPA + C(Sex)
114.87270405373037 Grade ~ GPA

The results show that the best model is given by the formula Grade ~ GPA.

12.3.2 Bootstrap

The bootstrap procedure uses re-sampling of the available data to mimic the process of obtaining new replicates, which allows us to estimate the variability of a statistical model parameter of interest without the need to generate new observations.

Replicates are obtained by repeatedly sampling observations from the original dataset with replacement. A bootstrap dataset \(\text{Tr}^*\) for a training set \(\text{Tr}\) with \(N\) observations is a sample of \(N\) such observations, drawn with replacement.

The process is repeated \(M\) times to obtain bootstrap samples \(\text{Tr}_i^*\) and parameter estimates \(\hat{\alpha}^*_i\), for \(i=1,\ldots, M\), from which we derive a bootstrap estimate \[\hat{\alpha}^*=\frac{1}{M}\sum_{i=1}^M\hat{\alpha}^*_i,\] with standard error \[\widehat{\text{se}}\left(\hat{\alpha}^*\right)=\sqrt{\frac{1}{M-1}\sum_{i=1}^M(\hat{\alpha}^*_i-\hat{\alpha}^*)^2}.\] The bootstrap can also be used to build approximate frequentist confidence intervals for the parameter \(\alpha\).210 We can even construct a covariance structure for the parameters, given enough replicates.

Finally, it should be noted that in more complex scenarios, the appropriate bootstrap procedure might be more sophisticated than what has been described here.211

Gapminder Example

We use the bootstrap procedure for the regression problem with life expectancy and the log of the GDP per capita in the 2011 Gapminder data.

We draw, with replacement, \(M=200\) bootstrap samples of size \(N=166\) from the original dataset. For sample \(1\leq i\leq M\), we find the OLS fit and retain the intercept \(\beta_{0,i}\) and slope \(\beta_{1,i}\).

beta_0 = c()
beta_1 = c()

set.seed(0) # for replicability
for(k in 1:200){
  index = sample.int(nrow(gapminder.2011.cv),nrow(gapminder.2011.cv),replace=TRUE)
  training.gap = gapminder.2011.cv[-index,]
  model.lm.gap = lm(life_expectancy~lgdppc, data=training.gap)
  beta_0[k] = model.lm.gap[[1]][1]
  beta_1[k] = model.lm.gap[[1]][2]
}

results.boot = data.frame(beta_0,beta_1)

We display the joint distribution of \(\boldsymbol{\beta}=(\beta_0,\beta_1)^{\!\top}\), together with the marginal distributions for each parameter.

p <- ggplot2::ggplot(results.boot, ggplot2::aes(x=beta_0, y=beta_1)) +
  ggplot2::geom_point() +
  ggplot2::theme(legend.position="none") 
ggExtra::ggMarginal(p, type="density", fill = "slateblue")

It would appear that \(\boldsymbol{\beta}\) approximately follows a multivariate normal \(\mathcal{N}(\mathbf{\mu},\mathbf{\Sigma})\), with \[\mathbf{\mu}\approx \mathbf{\hat{\mu}}^*=\begin{pmatrix}37.22 \\ 4.31\end{pmatrix},\ \mathbf{\Sigma}\approx \mathbf{\hat{\Sigma}}^*=\begin{pmatrix}6.32 & -0.72 \\ -0.72 & 0.08\end{pmatrix},\] as seen below:

(boot.beta_0 = mean(results.boot$beta_0))
(boot.beta_1 = mean(results.boot$beta_1))
(cov(results.boot))
[1] 37.22167
[1] 4.308113
           beta_0     beta_1
beta_0  6.3281004 -0.7168375
beta_1 -0.7168375  0.0845396

The vector \(\mathbf{\hat{\mu}}^*\) provides the bootstrap estimates; the corresponding estimates for the standard errors are \(\hat{\text{se}}(\mathbf{\hat{\mu}}^*)=(2.51,0.29)^{\!\top}\), and the 95% \(\text{CI}\) are \[\begin{aligned} \text{CI}(\beta_0;0.95)&=37.22\pm 2(2.51)\equiv(32.19,42.25) \\ \text{CI}(\beta_1;0.95)&=4.31\pm 2(0.29)\equiv(3.73,4.89),\end{aligned}\] as seen below:

(se.boot.beta_0 = sqrt(1/(nrow(results.boot)-1)*sum((results.boot$beta_0-mean(results.boot$beta_0))^2)))
(se.boot.beta_1 = sqrt(1/(nrow(results.boot)-1)*sum((results.boot$beta_1-mean(results.boot$beta_1))^2)))
[1] 2.515572
[1] 0.2907569

Note that the bootstrap estimates are wider than the corresponding cross-validation estimates.

12.3.3 Jackknife

The jackknife estimator arises from cross-validation when \(K=N\); the sole difference being in the variance estimate \[\text{Var}(\hat{\alpha}^*)=\frac{1}{N(N-1)}\sum_{i=1}^N(\hat{\alpha}_i^*-\overline{\hat{\alpha}^*})^2.\] The jackknife procedure is also known as leave one out validation.

Gapminder Example

We use the jackknife procedure for the regression problem with life expectancy and the log of the GDP per capita in the 2011 Gapminder data.

For each fold \(1\leq k\leq N\), we find the OLS fit on \(\text{TR}_k\) and retain the intercept \(\beta_{0,k}\) and slope \(\beta_{1,k}\).

beta_0 = c()
beta_1 = c()

set.seed(0)
for(k in 1:nrow(gapminder.2011.cv)){
  training.gap = gapminder.2011.cv[-k,]
  model.lm.gap = lm(life_expectancy~lgdppc, data=training.gap)
  beta_0[k] = model.lm.gap[[1]][1]
  beta_1[k] = model.lm.gap[[1]][2]
}

results.jack = data.frame(beta_0,beta_1)

We display the joint distribution of \(\boldsymbol{\beta}=(\beta_0,\beta_1)^{\!\top}\), together with the marginal distributions for each parameter.

p <- ggplot2::ggplot(results.jack, ggplot2::aes(x=beta_0, y=beta_1)) +
  ggplot2::geom_point() +
  ggplot2::theme(legend.position="none") 
ggExtra::ggMarginal(p, type="density", fill = "slateblue")

It would appear that \(\boldsymbol{\beta}\) approximately follows a multivariate normal \(\mathcal{N}(\mathbf{\mu},\mathbf{\Sigma})\), with \[\mathbf{\mu}\approx \mathbf{\hat{\mu}}^*=\begin{pmatrix}37.11 \\ 4.32\end{pmatrix},\ \mathbf{\Sigma}\approx \mathbf{\hat{\Sigma}}^*=\begin{pmatrix}0.021 & -0.002 \\ -0.002 & 0.0003\end{pmatrix},\] as seen below:

(jack.beta_0 = mean(results.jack$beta_0))
(jack.beta_1 = mean(results.jack$beta_1))
(cov(results.jack))
[1] 37.11005
[1] 4.316852
             beta_0        beta_1
beta_0  0.021108102 -0.0023729712
beta_1 -0.002372971  0.0002780615

The vector \(\mathbf{\hat{\mu}}^*\) provides the jackknife estimates; the corresponding estimates for the standard errors are \(\hat{\text{se}}(\mathbf{\hat{\mu}}^*)=(0.011,0.001)^{\!\top}\), and the 95% \(\text{CI}\) are \[\begin{aligned} \text{CI}(\beta_0;0.95)&=37.11\pm 2(0.011)\equiv(37.09,37.13) \\ \text{CI}(\beta_1;0.95)&=4.32\pm 2(0.001)\equiv(4.318,4.322),\end{aligned}\] as seen below:

(se.jack.beta_0 = sqrt(1/(nrow(results.jack)*(nrow(results.jack)-1))*sum((results.jack$beta_0-mean(results.jack$beta_0))^2)))
(se.jack.beta_1 = sqrt(1/(nrow(results.jack)*(nrow(results.jack)-1))*sum((results.jack$beta_1-mean(results.jack$beta_1))^2)))
[1] 0.0112764
[1] 0.001294245

Note that the jackknife estimates are much tighter than the corresponding bootstrap estimates. Will this always be the case?