Browsing by Author "Guo, Wenjing"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Academic performance under COVID-19: The role of online learning readiness and emotional competence(Springer, 2022) Wang, Yurou; Xia, Mengya; Guo, Wenjing; Xu, Fangjie; Zhao, Yadan; University of Alabama Tuscaloosa; Beijing Normal UniversityThe COVID-19 pandemic caused school closures and social isolation, which created both learning and emotional challenges for adolescents. Schools worked hard to move classes online, but less attention was paid to whether students were cognitively and emotionally ready to learn effectively in a virtual environment. This study focused on online learning readiness and emotional competence as key constructs to investigate their implications for students' academic performance during the COVID-19 period. Two groups of students participated in this study, with 1,316 high school students (Mean age = 16.32, SD = 0.63) representing adolescents and 668 college students (Mean age = 20.20, SD = 1.43) representing young adults. Structural equation modeling was conducted to explore the associations among online learning readiness, emotional competence, and online academic performance during COVID-19 after controlling for pre-COVID-19 academic performance. The results showed that, for high school students, both online learning readiness and emotional competence were positively associated with online academic performance during COVID-19. However, for college students, only online learning readiness showed a significant positive relationship with online academic performance during COVID-19. These results demonstrated that being ready to study online and having high emotional competence could make adolescents more resilient toward COVID-19-related challenges and help them learn more effectively online. This study also highlighted different patterns of associations among cognitive factors, emotional factors, and online academic performance during COVID-19 in adolescence and young adulthood. Developmental implications were also discussed.Item Exploring Rating Quality in the Context of High-Stakes Rater-Mediated Educational Assessments(University of Alabama Libraries, 2021) Guo, Wenjing; Wind, Stefanie; University of Alabama TuscaloosaConstructed response (CR) items are widely used in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district and state-level assessments in the United States. One unique feature of CR items is that they depend on human raters to assess the quality of examinees’ work. The judgment of human raters is a relatively subjective process because it is based on raters’ own understanding of assessment context, interpretations of rubrics, expectations of performance and professional experiences. As a result, the process of human rating may bring some random errors or bias, which may unfairly affect the assignment of ratings. The main purpose of this dissertation is to provide insight into methodological issues that arise due to the role of rater judgments performance assessments. This dissertation includes three independent but related studies. The first study systematically explores the impacts of ignoring rater effects when they are present on estimates of student ability. Results suggest that in simulation conditions that reflect many large-scale mixed-format assessments, directly modeling rater effects yields more accurate student achievement estimates than estimation procedures that do not incorporate raters. The second study proposes an iterative parametric bootstrap procedure to help researchers and practitioners more accurately evaluate rater fit. The results indicate that the proposed iterative procedure performs best because it has well-controlled false positive rates, high true positive rates, and overall accuracy rates compared to using traditional parametric bootstrap procedure and rule-of-thumb critical values. The third study examines the quality of ratings in the Georgia Middle Grades Writing Assessment using both the Partial Credit model formulation of Many Facets Rasch model (PC-MFR) and a Hierarchical Rater Model based on a signal detection model (HRM-SDT). Major findings suggests that rating quality varies across four writing domains, that rating quality varies across each category with each domain, that raters use the rating scale category in a psychometrically sound way, and that there is some correspondence between rating quality indices based on PC-MFR model and HRM-SDT.Item Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments(Sage, 2019) Wind, Stefanie A.; Guo, Wenjing; University of Alabama TuscaloosaRater effects, or raters' tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across a variety of disciplines. In many real-data studies of rater effects, researchers have reported that raters exhibit more than one effect, such as a combination of misfit and systematic biases related to student subgroups (i.e., differential rater functioning [DRF]). However, researchers who conduct simulation studies of rater effects usually focus on the effects in isolation. The purpose of this study was to explore the degree to which rater effect indicators are sensitive to rater effects when raters exhibit more than one type of effect, and to explore the degree to which this sensitivity changes under different data collection designs. We used a simulation study to explore combinations of DRF and rater misfit. Overall, our findings suggested that it is possible to use common numeric and graphical indicators of DRF and rater misfit when raters exhibit both these effects, but that these effects may be difficult to distinguish using only numeric indicators. We also observed that combinations of rater effects are easier to identify when complete rating designs are used. We discuss implications of our findings as they result to research and practice.Item Identifying and Understanding Examinee Behaviors in Item Response Data that Compromise Psychometric Quality(University of Alabama Libraries, 2022) Ge, Yuan; Wind, Stefanie A; University of Alabama TuscaloosaMy dissertation research explored responder behaviors (e.g., demonstrating response styles, carelessness, and possessing misconceptions) that compromise psychometric quality and impact the interpretation and use of assessment results. Identifying these behaviors can help researchers understand and minimize their potentially construct-irrelevant impact on the psychometric quality of measurement procedures. In three studies related to response quality, we analyzed item response data in terms of subgroups and in terms of missingness levels respectively to evaluate the sensitivity of item response theory (IRT) indicators to response characteristics that can compromise psychometric quality; we also validated a diagnostic classification model (DCM) aiming to diagnose the presence of misconceptions.Item An Iterative Parametric Bootstrap Approach to Evaluating Rater Fit(Sage, 2021) Guo, Wenjing; Wind, Stefanie A.; University of Alabama TuscaloosaWhen analysts evaluate performance assessments, they often use modern measurement theory models to identify raters who frequently give ratings that are different from what would be expected, given the quality of the performance. To detect problematic scoring patterns, two rater fit statistics, the infit and outfit mean square error (MSE) statistics are routinely used. However, the interpretation of these statistics is not straightforward. A common practice is that researchers employ established rule-of-thumb critical values to interpret infit and outfit MSE statistics. Unfortunately, prior studies have shown that these rule-of-thumb values may not be appropriate in many empirical situations. Parametric bootstrapped critical values for infit and outfit MSE statistics provide a promising alternative approach to identifying item and person misfit in item response theory (IRT) analyses. However, researchers have not examined the performance of this approach for detecting rater misfit. In this study, we illustrate a bootstrap procedure that researchers can use to identify critical values for infit and outfit MSE statistics, and we used a simulation study to assess the false-positive and true-positive rates of these two statistics. We observed that the false-positive rates were highly inflated, and the true-positive rates were relatively low. Thus, we proposed an iterative parametric bootstrap procedure to overcome these limitations. The results indicated that using the iterative procedure to establish 95% critical values of infit and outfit MSE statistics had better-controlled false-positive rates and higher true-positive rates compared to using traditional parametric bootstrap procedure and rule-of-thumb critical values.