Enhancing Content Validity Assessment With Item Response Theory Modeling
PDF

Palabras clave

Content validity
Subject matter experts
Item response theory
Validity
Test development Validez de contenido
Expertos en la materia
Teoría de respuesta al ítem
Validez
Desarrollo de tests

Cómo citar

Schames Kreitchmann, R., Nájera, P., Sanz, S., & Sorrel, M. Ángel. (2024). Enhancing Content Validity Assessment With Item Response Theory Modeling. Psicothema, 36(2), 145–153. Recuperado a partir de https://reunido.uniovi.es/index.php/PST/article/view/21248

Resumen

AntecedentesGarantizar la validez de evaluaciones requiere un examen exhaustivo del contenido de una prueba. Es común emplear expertos en la materia (EM) para evaluar la relevancia, representatividad y adecuación de los ítems. Este artículo propone integrar la teoría de respuesta al ítem (TRI) en las evaluaciones hechas por EM. La TRI ofrece parámetros de discriminación y umbral de los EM, evidenciando su desempeño al diferenciar ítems relevantes/ irrelevantes, detectando desempeños subóptimos, mejorando también la estimación de la relevancia de los ítems. Método: Se comparó el uso de la TRI frente a índices tradicionales (índice de validez de contenido y V de Aiken) en ítems de responsabilidad. Se evaluó la precisión de los EM al discriminar si los ítems medían responsabilidad o no, y si sus evaluaciones permitían predecir los pesos factoriales de los ítems. Resultados: Las puntuaciones de TRI identificaron bien los ítems de responsabilidad (R2 = 0,57) y predijeron sus cargas factoriales (R2 = 0,45). Además, mostraron validez incremental, explicando entre 11% y 17% más de varianza que los índices tradicionales. Conclusiones: La TRI en las evaluaciones de los EM mejora la alineación de ítems y predice mejor los pesos factoriales, mejorando validez del contenido de los instrumentos.

PDF

Citas

Abad, F. J., Sorrel, M. A., Garcia, L. F., & Aluja, A. (2018). Modeling general, specific, and method variance in personality measures: Results for ZKA-PQ and NEO-PI-R. Assessment, 25(8), 959–977. https://doi.org/10.1177/1073191116667547

Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40(4), 955– 959. https://doi.org/10.1177/001316448004000419

Almanasreh, E., Moles, R., & Chen, T. F. (2019). Evaluation of methods used for estimating content validity. Research in Social and Administrative Pharmacy, 15(2), 214–221. https://doi.org/10.1016/j. sapharm.2018.03.066

American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME] (Eds.). (2014). Standards for educational and psychological testing (14th ed.). American Educational Research Association.

Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with States’ content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21–29. https://doi.org/10.1111/j.1745-3992.2003. tb00134.x

Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06

Collado, S., Corraliza, J. A., & Sorrel, M. A. (2015). Spanish version of the Children’s Ecological Behavior (CEB) scale. Psicothema, 27(1), 82–87. https://doi.org/10.7334/psicothema2014.117

Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personaliry Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Psychological Assessment Resources.

Fitzpatrick,A. R. (1983). The meaning of content validity. Applied Psychological Measurement, 7(1), 3–13. https://doi.org/10.1177/014662168300700102

García, P. E., Díaz, J. O., & Torre, J. de la. (2014). Application of cognitivediagnosis models to competency-based situational judgment tests. Psicothema, 26(3), 372–377. https://doi.org/10.7334/psicothema2013.322

Gómez-Benito, J., Sireci, S., & Padilla, J.-L. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30, 104–109. https://doi.org/10.7334/psicothema2017.183 Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76(4), 537–549. https://doi.org/10.1007/s11336-011-9218-4

Kreitchmann, R. S., Abad, F. J., Ponsoda, V., Nieto, M. D., & Morillo, D. (2019). Controlling for response biases in self-report scales: Forced-choice vs. psychometric modeling of likert items. Frontiers in Psychology, 10, Article 2309. https://doi.org/10.3389/fpsyg.2019.02309

Li, X., & Sireci, S. G. (2013). A new method for analyzing content validity data using multidimensional scaling. Educational and Psychological Measurement, 73(3), 365–385. https://doi.org/10.1177/0013164412473825 Lunz, M. E., Stahl, J. A., & Wright, B. D. (1994). Interjudge reliability and decision reproducibility. Educational and Psychological Measurement, 54(4), 913–925. https://doi.org/10.1177/0013164494054004007

Lunz, M. E., Wright, B. D., & Linacre, J. M. (1990). Measuring the impact of judge severity on examination scores. Applied Measurement in Education, 3(4), 331-345. https://doi.org/10.1207/s15324818ame0304_3

Martone, A., & Sireci, S. G. (2009). Evaluating alignment between curriculum, assessment, and instruction. Review of Educational Research, 79(4), 1332– 1361. https://doi.org/10.3102/0034654309341375

Martuza, V. R. (1977). Applying norm-referenced and criterion-referenced measurement in education. Allyn and Bacon.

Mastaglia, B., Toye, C., & Kristjanson, L. J. (2003). Ensuring content validity in instrument development: Challenges and innovative approaches. Contemporary Nurse, 14(3), 281–291. https://doi.org/10.5172/conu.14.3.281

McCoach, D. B., Gable, R. K., & Madura, J. P. (2013). Instrument development in the affective domain: School and corporate applications. Springer. https://doi.org/10.1007/978-1-4614-7135-6

Nájera, P., Abad, F. J., & Sorrel, M. A. (2021). Determining the number of attributes in cognitive diagnosis modeling. Frontiers in Psychology, 12, Article 614470.

Nieto, M. D., Abad, F. J., Hernández-Camacho, A., Garrido, L. E., Barrada, J. R., Aguado, D., & Olea, J. (2017). Calibrating a new item pool to adaptively assess the Big Five. Psicothema, 29(3), 390–395. https://doi.org/10.7334/psicothema2016.391

Oltmanns, J. R., & Widiger, T.A. (2020). The five-factor personality inventory for ICD-11: A aacet-level assessment of the ICD-11 trait model. Psychological Assessment, 32(1), 60–71. https://doi.org/10.1037/pas0000763

Penfield, R. D., & Giacobbi, Jr., Peter R. (2004). Applying a score confidence interval to Aiken’s item content-relevance index. Measurement in Physical Education and Exercise Science, 8(4), 213–225. https://doi.org/10.1207/ s15327841mpee0804_3

Polit, D. F., & Beck, C. T. (2006). The content validity index: Are you sure you know what’s being reported? critique and recommendations. Research in Nursing & Health, 29(5), 489–497. https://doi.org/10.1002/nur.20147

Porter, A. C. (2002). Measuring the content of instruction: Uses in research and practice. Educational Researcher, 31(7), 3–14. https://doi. org/10.3102/0013189X031007003

R Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https:// www.R-project.org/

Rios, J., & Wells, C. (2014). Validity evidence based on internal structure. Psicothema, 26(1), 108–116. https://doi.org/10.7334/psicothema2013.260 Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings:

Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101–138.

Rovinelli, R. J., & Hambleton, R. K. (1977). On the use of content specialists in the assessment of criterion-referenced test item validity. Dutch Journal of Educational Research, 2, 49–60.

Rubio, D. M., Berg-Weger, M., Tebb, S. S., Lee, E. S., & Rauch, S. (2003). Objectifying content validity: Conducting a content validity study in social work research. Social Work Research, 27(2), 94–104. https://doi.org/10.1093/swr/27.2.94

Samejima, F. (1968). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 2), 100–100.

Sireci, S. G. (1998a). Gathering and analyzing content validity data. Educational Assessment, 5(4), 299–321. https://doi.org/10.1207/s15326977ea0504_2

Sireci, S. G. (1998b). The construct of content validity. Social Indicators

Research, 45(1/3), 83–117.

Sireci, S., & Benítez, I. (2023). Evidence for test validation: A guide for practitioners. Psicothema, 35(3), 217-226. https://doi.org/10.7334/ psicothema2022.477

Sireci, S. G., & Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema, 26(1), 100–107. https://doi.org/10.7334/ psicothema2013.256

Tatsuoka, K. K. (1983). Rule Space: An Approach for Dealing with Misconceptions Based on Item Response Theory. Journal of Educational Measurement, 20(4), 345–354.

Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory.

Psychometrika, 47(4), 397-412. https://doi.org/10.1007/BF02293705 Waugh, M. H., McClain, C. M., Mariotti, E. C., Mulay, A. L., DeVore, E. N., Lenger, K. A., Russell, A. N., Florimbio, A. R., Lewis, K. C., Ridenour, J. M., & Beevers, L. G. (2021). Comparative content analysis of self-report scales for level of personality functioning. Journal of Personality Assessment, 103, 161–173. https://doi.org/10.1080/00223891.2019.1705464

Webb, N. L. (2007). Issues related to judging the alignment of curriculum standards and assessments. Applied Measurement in Education, 20(1), 7–25. https://doi.org/10.1080/08957340709336728

Wu, M. (2017). Some IRT-based analyses for interpreting rater effects. Psychological Test and Assessment Modeling, 59(4), 453–470.