Calculation Of The Confidence Interval
For an accuracy test assigned a given Level of Confidence (LOC) and given an assumed accuracy (p) of the system to be tested, calculate the Confidence Interval (µ):
µ = z * [p * (1 - p) / n] ^{-}^{1/2}
where:
p = |
the assumed (1st
guess) accuracy of the system expressed as a decimal; |
n = |
number of samples
(number of test records used); and |
z = |
a function of the
test's Level of Confidence (LOC) |
The value of z is derived from the Normal Curve. The table below lists some z to LOC values:
LOC |
Z |
99.9% |
3.3 |
99.0% |
2.577 |
98.5% |
2.43 |
97.5% |
2.243 |
95.0% |
1.96 |
90.0% |
1.645 |
85.0% |
1.439 |
75.0% |
1.151 |
From the equation for µ, above, we can see that, as the assumed accuracy for the system to be tested approaches 100% (p = 1.0), the value of µ => zero. From a practical point of view, this makes a good deal of sense; if the accuracy of the system truly is 100%, the test will always result in a perfect score -- with no variation in the result -- even if it is performed time after time.
A plot of the value of the term p*(1-p) as a function of p (the assumed accuracy of the system) from 0.0 (i.e., the system is never expected to be successful) to 1.0 (i.e., the system is expected to always be successful) is shown as follows:
The term p*(1-p)
reaches a maximum when p = 0.5.
The value of µ is directly
proportional to the value of the term p*(1-p); therefore, for a given
LOC, the Confidence Interval, µ, reaches a
maximum when the assumed accuracy of the system is 50%. Using a 50%
assumed accuracy (p) is equivalent to saying that you have very little or no
knowledge of the accuracy of the system to be tested.
What factors have the most influence on the value of µ? Recall that
µ = z * [p*(1 - p) / n] ^{-}^{1/2}
► z ranges from 1.15 to 3.30 (75% to 99.9% LOC). As this term increases, the value of µ increases proportionally
► The term p*(1 - p) ranges from 0.0 to 0.25. As this term increases, the value of µ increases.
► As the value of n increases, the value of µ decreases proportionally to the term [p*(1 - p) / n] ^{-}^{1/2}
Therefore,
For given values of p
and n, increasing the LOC (z) of the test means that the
Confidence Interval (µ) becomes wider. This makes practical sense; if
we say that we are 99.9% confident that the true accuracy of the system is
within ±µ of the measured accuracy, then µ must be large enough to contain the whole range of
measured accuracy scores that we find during repeated tests. If we say that we are only 75% sure of the
actual accuracy of the system; then we are not taking an unreasonable risk by
saying that, in 75 of 100 tests, the true system accuracy is within a small
value of µ on either side of the measured
accuracy score.
As stated before, if
we do not have any guess as to what the accuracy of the system may be, then the
value of p must be set to 0.5 (i.e., the assumed accuracy is 50%). With p set to 0.5, the size of the term p*(1 - p) is at
its maximum, hence, the Confidence Interval (µ) is at its maximum. However, if we have a good guess as to the
true accuracy of the system (the value of p tends toward 0.0 or 1.0 – i.e., 0% or 100%), then,
for a fixed LOC and sample size n, the size of the
Confidence Interval (µ) will become smaller. To summarize, all other factors remaining
constant, if we have some sort of knowledge of the system’s accuracy, the
Confidence Interval of the test becomes smaller.
For a given LOC (z) and estimated accuracy (p), increasing the number of samples (n) in the test set results in a smaller Confidence Interval (µ). The measured accuracy result of an accuracy test designed with, for instance, 6,000 test samples would be nearly equal to the actual accuracy level that the system is capable.
The following link points to an example of calculating Confidence Intervals (µ) for given values of Level Of Confidence, assumed accuracy (p) and number of samples (n):
Example One -- LOC =
95%, assumed accuracy = 50%;
Example Two – Calculate the
Confidence Interval (µ) with LOC, n and p values that you specify.
^{ }