Using Bayesian techniques with item response theory to analyze mathematics tests

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
University of Alabama Libraries

Due to the cost of a college education, final exams for college level courses fall under the category of ``high-stakes'' tests. An incorrectly measured assessment may result in students paying thousands of dollars toward retaking the course, scholarships being rescinded, or universities with students taking courses for which they are not prepared, as well as many other undesirable consequences. Therefore, faculty at colleges and universities must understand the reliability of these tests to accurately measure student knowledge. Traditionally, for large general education courses, faculty use common exams and measure their reliability using Classical Test Theory (CTT). However, the cut off scores are arbitrarily chosen, and little is known about the accuracy of measurement at these critical scores. A solution to this dilemma is to use Item Response Theory (IRT) models to determine the instrument's reliability at various points along the student ability spectrum. Since cost is always on the mind of faculty and administrators at these schools, we compare the use of free software (Item Response Theory Command Language) to generally accepted commercial software (Xcalibre) in the analysis of College Algebra final exams. With both programs, a Bayesian approach was used: Bayes modal estimates were obtained for item parameters and EAP (expected a posteriori) estimates were obtained for ability parameters. Model-data fit analysis was conducted using two well-known chi-square fit statistics with no significant difference found in model-data fit. Parameter estimates were compared directly along with a comparison of Item Response Functions using a weighted version of the root mean square error (RMSE) that factors in the ability distribution of examinees resulting in comparable item response functions between the two programs. Furthermore, ability estimates from both programs were found to be nearly identical. Thus, when the assumptions of IRT are met for the two- and three-parameter logistic and the generalized partial credit models, the freely available software program is an appropriate choice for the analysis of College Algebra final exams.

Electronic Thesis or Dissertation