Over the last 15 years, there has been a great deal of interest in developing teacher evaluation systems that used a class of statistical models based on test scores known as "Value-Added Methods" or VAM. Like many in the statistical community I viewed claims of the capabilities of these models with skepticism, but VAM had so much momentum at that point that it was unstoppable. However, as more experience was accumulated, the shortcomings became apparent. In 2014, the Americal Statistical Association issued a cautionary statement on VAM.
While not a VAM measure, the Student Growth Percentile (SGP), also known as the Colorado Growth Model (or even the Rhode Island Growth Model), experienced a similar trajectory, going quickly from a pilot program to automation and implementation by more than 20 states as one of the pimary measures of a student's educational progress.
The rapid adoption of SGP was partly a result of the requirements of the "Race to the Top" program, and partly because the developers released the SGP software as a freely available open source package in the R Statistical System.
In the summer of 2015 I did a research project with three students working towards teacher certification to investigate the operational characteristics of the SGP. We developed a mathematically concise definition of the SGP measure as a conditional probability, and used the actual SGP software to model data generated using the Item Response Theory parameters of the Massachusetts Common Assessment System, as well as a Bayesian analysis based on our definition. We found good agreement between our definition and the simulated data, and were able to produce interval estimates for the SGP scores indicating that the SGP is a noisy measure, with a 95% credible interval 80 percentage points wide. These results were presented at the New England Statistics Symposium.
At the 2014 American Statistical Association Joint Meetings, I spoke with Dan McCaffrey of ETS, who has done extensive work on VAM. He told me that every independent researcher who has investigated the precision of the SGP has concluded that, for an individual student score, the 95% confidence interval is 0 to 100.