Let us keep the gist of the preceding diary entry and show how to compute a correct
confidence interval for the sample variance $s^2$ of a one-dimensional sample $x_1, \dots, x_n$. This is of
obvious immediate interest in statistics outside of supervised learning as well.
So, to keep things simple, the sample variance is commonly used in the form $s^2 = \sum (x_i - \bar{x})^2$. The nomenclature with the square is a little confusing so let us abbreviate $v = s^2$. A standard computation allows to write that as $$ v = \frac{1}{n}\sum x_i^2 - \frac{2}{n(n-1)}\sum_{i < j}x_i x_j $$ which could be derived independently by viewing the true variance as the expectation of a squared observation minus the square of the expectation. The first term estimates the first, and the second term is the "good-friend" estimator for the second.
Now, $v$ can also be written as $$ v = \frac{2}{n(n-1)}\sum_{i < j}(x_i - x_j)^2/2 $$ To get a confidence interval for the sample mean we need to be able to estimate the variance of that.
Without twisting our heads too much, we can apply the same strategy once again. Its variance is the expectation of $v^2$ (which is estimated by $v^2$ itself - easy) minus the square of its estimation which requires a little more care. However, applying the strategy of the preceding diary entry we see that the expression $$ \frac{1}{n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l}\frac{(x_i-x_j)^2}{2}\cdot \frac{(x_k-x_l)^2}{2} $$ is the "good-friend" estimator for the squre of the expectation $\bigl(\mathbb{E}v)^2 = \bigl(\mathbb{V}X\bigr)^2$. At this point, we could stop and content ourselves with code that directly implements that formula. However, that would be computationally quite inefficient.
Let us expand the terms:
$$
\frac{1}{4n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l} x_i^2 x_k^2 + x_i^2 x_l^2 + x_j^2x_k^2 + x_j^2 x_l^2 - 2x_k
x_l x_i^2 - 2 x_k x_l x_j^2 - 2x_i x_j x_k^2 - 2x_i x_j x_l^2 + 4 x_i x_j x_k x_l
$$
and regroup into three summands, the first one being:
$$
\frac{1}{4n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l} x_i^2 x_k^2 + x_i^2 x_l^2 + x_j^2x_k^2 + x_j^2 x_l^2
$$
the second one being
$$
- \frac{2}{4n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l} x_k x_l x_i^2 + x_k x_l x_j^2 +x_i x_j x_k^2 + x_i x_j x_l^2
$$
and the third one being
$$ \frac{4}{4n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l} x_i x_j x_k x_l
$$
What is the meaning of these three terms? Indeed, we may note on the side that (if it happens to interest you) the
first one estimates the square of the second raw moment of $X$, the second one estimates minus twice the square of
the mean times the second raw moment, and the third one estimates the fourth power of the mean. So in total, we
estimate the expression
$$
(\mathbb{E}X^2)^2 - 2m^2\mathbb{E}X^2 + m^4 = (\mathbb{E}X^2 - m^2)^2
$$
which makes perfect sense because that is just another way of writing the square of the variance of $X$ because
$\mathbb{V}X= (\mathbb{E}X^2 - m^2)$. So, everything is in perfect order, so far.So, to keep things simple, the sample variance is commonly used in the form $s^2 = \sum (x_i - \bar{x})^2$. The nomenclature with the square is a little confusing so let us abbreviate $v = s^2$. A standard computation allows to write that as $$ v = \frac{1}{n}\sum x_i^2 - \frac{2}{n(n-1)}\sum_{i < j}x_i x_j $$ which could be derived independently by viewing the true variance as the expectation of a squared observation minus the square of the expectation. The first term estimates the first, and the second term is the "good-friend" estimator for the second.
Now, $v$ can also be written as $$ v = \frac{2}{n(n-1)}\sum_{i < j}(x_i - x_j)^2/2 $$ To get a confidence interval for the sample mean we need to be able to estimate the variance of that.
Without twisting our heads too much, we can apply the same strategy once again. Its variance is the expectation of $v^2$ (which is estimated by $v^2$ itself - easy) minus the square of its estimation which requires a little more care. However, applying the strategy of the preceding diary entry we see that the expression $$ \frac{1}{n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l}\frac{(x_i-x_j)^2}{2}\cdot \frac{(x_k-x_l)^2}{2} $$ is the "good-friend" estimator for the squre of the expectation $\bigl(\mathbb{E}v)^2 = \bigl(\mathbb{V}X\bigr)^2$. At this point, we could stop and content ourselves with code that directly implements that formula. However, that would be computationally quite inefficient.
As we will see, each of these three summands can be computed on its own quite easily. Obviously, the first just boils down to the average of terms $x_i^2x_j^2$: $$ \frac{2}{n(n-1)}\sum_{i< j} x_i^2 x_j^2 $$ which can be computed without too much hassle.
As we look at the second term, we see that each of the four summands contributes the same, due to symmetry: $$ - \frac{2}{4n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l} 4x_k x_l x_i^2 = - \frac{2}{n(n-1)(n-2)(n-3)}\sum_{i\ne j\ne k \ne l} x_k x_l x_i^2 $$ We can now get rid of the sum over all $j$, and simplify the leading factor by cancelling $n-3$: $$ -\frac{2}{n(n-1)(n-2)}\sum_{i\ne k \ne l} x_k x_l x_i^2 $$ Note that the number of summands on the right hand side is $n(n-1)(n-2)$, so that this term is minus twice the average such summand. Unfortunately, these are too many summands to be computed with straightforward for-loops for large $n$!
However, there is a large trick which ultimately comes from the fundamental theorem on elementary symmetric polynomials: $$ \sum_{i\ne k \ne l}x_i^2 x_k x_l = \frac{1}{3} \biggl(\sum_{i\ne k \ne l}x_i x_k x_l \sum_j x_j - \sum_{i\ne j \ne k \ne l} x_i x_j x_k x_l\biggr) $$ and note that all three sums appearing on the right hand side are closely related to elementary symmetric polynomials! The clue is that there is an $O(n)$ algorithm to compute them so once we reduce the computation to an expression only involving elementary symmetric polynomials, we are done. Once this identity is there it is easy to check but it took me a while to find it.
From now on, the rest is downhill. Indeed, $$ \sum_{i\ne k \ne l}x_i x_k x_l = 6\sum_{i < k < l}x_i x_k x_l $$ and $$ \sum_{i\ne j \ne k \ne l} x_i x_j x_k x_l=24\sum_{i < j < k < l} x_i x_j x_k x_l $$ so we have reduced the second term to elementary symmetric polynomials. In total, the second term can be computed by $$ -\frac{2}{n(n-1)(n-2)}\sum_{i\ne k \ne l} x_k x_l x_i^2=-\frac{2}{3n(n-1)(n-2)}\biggl(6\sum_{i < k < l}x_i x_k x_l \sum_j x_j - 24\sum_{i < j < k < l} x_i x_j x_k x_l \biggr) $$ And there we are!
For the full code, see https://github.com/Mathias-Fuchs/ConfIntVariance/blob/master/confIntVariance/R/main.R, where all terms have been put together.
Also, the examples given there confirms that the coverage probability for the true variance is very close to the target of 95 percent in a simulation study where we throw a dice and check in what fraction of cases the confidence interval thus computed does indeed contain the true value $91/6 - 49/4$ of the dice's variance. Yayy!!