Considerations that should be made in the design of an Accuracy Test

(using a visual demonstration of the inter-related effects of the desired Level of Confidence in the test, sample size, and consequent Confidence Interval)

 

 

Instructions For Using This Demo

Check The Demo Calculations

Explanation of Level of Confidence (or confidence level)

How Should This Information Be Used To Design An Accuracy Test?

 

Suppose a very large number (for example, one million) of tests are run on a system and the measured accuracy score for each test is recorded.  Each test that is performed uses the same number of test sets (or samples) but each test is composed of a different set of data.  The measured accuracy of each test is recorded and, after all the tests are completed, the mean score of all the measured accuracies is determined; we will call this the “true mean” accuracy of the system.  In the diagram above, the true mean accuracy of the system is shown as the dark, middle vertical line.  The lighter-colored vertical lines that are distributed on either side of the middle line represent the distribution of measured accuracy tests of all the tests performed.

This java applet simulates repeated accuracy tests using test sets picked at random from a normal population of test data.  As each simulated accuracy test is completed, a red bar with a red dot centered in the middle of the bar is generated.  The red dot represents the measured accuracy of the test.  The red lines on either side of the dot represent a confidence interval around the measured accuracy score. The sample size and test confidence level used to calculate the confidence interval are shown in the upper right-hand corner of the display.

Sometimes the true mean accuracy of the system being tested is contained within the confidence interval around the measured accuracy score for a particular test and sometimes it is not.  With repeated testing, the percent of confidence intervals that actually capture the true mean accuracy (the dark, middle bar in the applet shown above) should be approximately equal to the Level Of Confidence (LOC) -- or confidence level -- of the test.  These calculations are shown in the upper left-hand corner of the applet.

You will notice, for instance, that a test with a small number of test sets and a very high Level of Confidence (for example, 99.9%), the Confidence Interval for each test (red line) is very wide; also; the measured accuracy (red dot) for succeeding tests tends to vary over a wide range of values.  If the LOC is kept constant but the number of samples used for each test is increased, the Confidence Interval becomes smaller and the measured accuracy tends to vary within a smaller interval.

 

 


Java Applet written by Lesley Robinson

Modified by Thomas Ruggles