|
|
|||||
|
Considerations
that should be made in the design of an Accuracy Test (using a
visual demonstration of the inter-related effects of the desired Level of
Confidence in the test, sample size, and consequent Confidence Interval) |
|
||||
|
Suppose a very large number
(for example, one million) of tests are run on a system and the measured accuracy
score for each test is recorded.
Each test that is performed uses the same number of test sets
(or samples) but each test is composed of a different set of
data. The measured accuracy of
each test is recorded and, after all the tests are completed, the mean score
of all the measured accuracies is determined; we will call this the “true
mean” accuracy of the system.
In the diagram above, the true mean accuracy of the system is shown as
the dark, middle vertical line.
The lighter-colored vertical lines that are distributed on either side
of the middle line represent the distribution of measured accuracy tests of
all the tests performed. This java applet simulates repeated accuracy tests using test
sets picked at random from a normal population of test data. As each simulated accuracy test is
completed, a red bar with a red dot centered in the middle of the bar is
generated. The red dot
represents the measured accuracy of the test. The red lines on either side of
the dot represent a confidence interval around the measured accuracy
score. The sample size and test confidence level used to calculate the
confidence interval are shown in the upper right-hand corner of the display. Sometimes the true mean accuracy of the system
being tested is contained within the confidence interval around the measured
accuracy score for a particular test and sometimes it is not. With repeated testing, the percent of
confidence intervals that actually capture the true mean accuracy
(the dark, middle bar in the applet shown above) should be approximately
equal to the Level Of Confidence (LOC) -- or confidence level -- of the
test. These calculations are
shown in the upper left-hand corner of the applet. You will notice, for instance, that a test with a small number of test sets and a very high Level of Confidence (for example, 99.9%), the Confidence Interval for each test (red line) is very wide; also; the measured accuracy (red dot) for succeeding tests tends to vary over a wide range of values. If the LOC is kept constant but the number of samples used for each test is increased, the Confidence Interval becomes smaller and the measured accuracy tends to vary within a smaller interval.
|
|||||
|
|
|||||
Java Applet written by Lesley Robinson
Modified by Thomas Ruggles