Resumen
Antecedentes: Las escalas tipo Likert fueron propuestas por Rensis Likert en 1932. Dada su sencillez y eficacia son uno de los instrumentos de evaluación más utilizados en muchas áreas científicas y profesionales. El objetivo del presente trabajo es revisar su utilización y proponer unas directrices prácticas para guiar su construcción, análisis y uso adecuados. Método: Se llevó a cabo una revisión crítica y sistemática de los trabajos y directrices publicados sobre la construcción, análisis, puntuación, uso e interpretación de las escalas Likert. Resultados: Se identificaron distintos aspectos de la construcción y del uso de las escalas tipo Likert que son susceptibles de mejora, como son la definición de los constructos a medir, la formulación de los ítems, el número de categorías, los análisis de las respuestas, las evidencias de validez aportadas, la calibración de los ítems, y la interpretación de los resultados. Conclusiones: Los resultados obtenidos se sintetizan en una guía práctica para investigadores y profesionales, compuesta por quince recomendaciones, diez centradas en el diseño, la construcción y el análisis adecuado de las escalas, y cinco encaminadas a guiar a los usuarios en la utilización adecuada de las escalas ya existentes.
Citas
Addams, C. (1982, November 29). Would you say Attila is doing an excellent job, a good job, a fair job, or a poor job? The New Yorker. https://www.newyorker.com/magazine/1982/11/29/a-normal-tuesday
American Educational Research Association [AERA] (2014). Standards for educational and psychological testing. American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education.
Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in organizational research. Organizational Research Methods, 1(1), 45-87. https://doi.org/10.1177/109442819800100104
Bass, B. M., Cascio, W. F., & O’Connor, E. J. (1974). Magnitude estimations of expressions of frequency and amount. Journal of Applied Psychology, 59(3), 313-320. https://doi.org/10.1037/h0036653
Benson, N., Kranzler, J. H., & Floyd, R. G. (2020). Exploratory and confirmatory factor analysis of the Universal Nonverbal Intelligence Test - Second Edition: Testing dimensionality and invariance across age, gender, race, and ethnicity. Assessment, 27(5), 996-1006. https://doi.org/10.1177/1073191118786584
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons. https://doi.org/10.1002/9781118619179
Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305- 314. https://doi.org/10.1037//0033-2909.110.2.305
Bollen, K. A., & Long. J. S. (1993). Testing structural equation models. Sage.
Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892-895.
https://doi.org/10.1126/science.1165821
Borsboom, D. (2017). A network theory of mental disorders. World Psychiatry, 16(1), 5-13. https://doi.org/10.1002/wps.20375
Borsboom, D. (2022). Possible futures for network psychometrics. Psychometrika, 87(1), 253-265. https://doi.org/10.1007/S11336-022-09851-Z
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791-799. https://doi.org/10.2307/2286841
Brown, G. T. L., & Zhao, A. (2023). In defense of psychometric measurement: A systematic review of contemporary self-report feedback inventories. Educational Psychologist, 58(3), 178-192. https://doi.org/10.1080/00461520.2023.2208670
Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44(1), 108-132. https://doi.org/10.1006/jmps.1999.1279 Burisch, M. (1984). Approaches to personality inventory construction: A
comparison of merits. American Psychologist, 39(3), 214-227. https://doi.org/10.1037//0003-066x.39.3.214
Buss, D. M., & Craik, K. H. (1981). The act frequency analysis of interpersonal dispositions: Aloofness, gregariousness, dominance and submissiveness. Journal of Personality, 49(2), 175-192. https://doi.org/10.1111/j.1467-6494.1981.tb00736.x
Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106-116. https://doi.org/10.3844/jssp.2007.106.116
Casas, J. M., Dueñas, J.-M., Ferrando, P. J., Castarlenas, E., Vigil-Colet, A., Hernández-Navarro, J. C., & Morales-Vives, F. (2025). Measuring the callous-unemotional traits in juvenile offenders: Properties and Functioning of the INCA Questionnaire in This Population. Psychiatry, Psychology and Law, 1-22. https://doi.org/10.1080/13218719.2025.24 97785
Casper, W. C., Edwards, B. D., Wallace, J. C., Landis, R. S., & Fife, D. A. (2020). Selecting response anchors with equal intervals for summated rating scales. Journal of Applied Psychology, 105(4), 390-409. https://doi.org/10.1037/apl0000444
Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412-1427. https://doi.org/10.1037/pas0000626
Comrey, A. L. (1988). Factor-analytic methods of scale development in personality and clinical psychology. Journal of Consulting and Clinical Psychology, 56(5), 754-761. https://doi.org/10.1037//0022-006x.56.5.754
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Psychology Press.
Cooper, C. (2019). Pitfalls of personality theory. Personality and Individual Differences, 151, 109551. https://doi.org/10.1016/j.paid.2019.109551 Cronbach, L. J. (1950). Further evidence on response sets and test design. Educational and Psychological Measurement, 10(1), 3-31. https://doi.org/10.1177/001316445001000101
Cronbach,L.J.,&Gleser,G.C.(1964).Thesignal/noiseratiointhecomparison of reliability coefficients. Educational and Psychological Measurement, 24(3), 467-480. https://doi.org/10.1177/001316446402400303
Cronbach, L.J., & Warrington, W. G. (1952). Efficiency of multiple-choice tests as a function of spread of item difficulties. Psychometrika, 17(2), 127–147. https://doi.org/10.1007/bf02288778
De la Fuente, D., & Armayones, M. (2025). AI in psychological practice: what tools are available and how can they help in clinical psychology? Psychologist Papers, 46(1), 18-24. https://doi.org/10.70478/pap. psicol.2025.46.03
DeVellis, R. F. (2003). Scale development: Theory and applications. Sage Publications.
DuBois, B., & Burns, J. A. (1975). An analysis of the meaning of the question mark response category in attitude scales. Educational and Psychological Measurement, 35(4), 869-884. https://doi.org/10.1177/001316447503500414
Elosua, P., Aguado, D., Fonseca-Pedrero, E., Abad, F. J., & Santamaría,
P. (2023). New trends in digital technology-based psychological and educational assessment. Psicothema, 35, 50-57.
https://doi.org/10.7334/ psicothema2022.241
Ferrando, P. J. (2021). Seven decades of factor analysis: From Yela to the present day. Psicothema, 33(3), 378-375.
https://doi.org/10.7334/psicothema2021.24
Ferrando, P. J. (2003). A Kernel density analysis of continuous typical- response scales. Educational and Psychological Measurement, 63(5), 809-824. https://doi.org/10.1177/0013164403251323
Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2009). A general factor- analytic procedure for assessing response bias in questionnaire measures. Structural Equation Modeling: A Multidisciplinary Journal, 16(2), 364-381. https://doi.org/10.1080/1070551090275137
Ferrando P. J., Lorenzo-Seva, U., Hernández-Dorado A., & Muñiz, J. (2022). Decalogue for the factor analysis of test items. Psicothema, 34(1), 7-17. https://doi.org/10.7334/psicothema2021.456
Ferrando, P. J., & Morales-Vives, F. (2023). Is it quality, is it redundancy, or is model inadequacy? Some strategies for judging the appropriateness of high-discrimination items. Anales de Psicología, 39(3), 517-527. https://doi.org/10.6018/analesps.535781
Ferrando, P. J., Navarro-González, D., & Lorenzo-Seva, U. (2024). A relative normed effect-size difference index for determining the number of common factors in exploratory solutions. Educational and Psychological Measurement, 84(4), 736-752.
https://doi.org/10.1177/00131644231196482
Fink, A. (2003). The survey handbook. Sage. https://doi.org/10.4135/9781412986328
Fiske, S. T., & Taylor, S. E. (2020). Social cognition evolves: Illustrations from our work on intergroup bias and on healthy adaptation. Psicothema, 32(3), 291-297. https://doi.org/10.7334/psicothema2020.197
Fonseca, E. (2018). Network analysis in psychology. Papeles del Psicólogo, 39(1), 1-12. https://doi.org/10.23923/pap.psicol2018.2852
Fonseca, E., Falcó, R., Al-Halabí, S., & Muñiz, J. (2025). Evaluación de la salud mental en contextos educativos Mental health assessment in educational settings. In E. Fonseca, & S. Al-Halabí (Eds.), Salud mental en contextos educativos (pp. 181-235). Editorial Pirámide.
Fonseca, E., & Muñiz, J. (2025). Análisis de Redes en la Medición Psicológica: Fundamentos [Network Analysis in Psychological Measurement: Fundamentals]. Acción Psicológica, 22(1), 87-100. https://doi.org/10.5944/ap.22.1.43296
Frary, R. B. (2003). A brief guide to questionnaire development. Virginia Polytechnic Institute & State University. https://medrescon.tripod.com/ questionnaire.pdf
Furnham, A. (1990). The development of single trait personality theories. Personality and Individual Differences, 11(9), 923-929.
https://doi.org/10.1016/0191-8869(90)90273-t
García-Pérez, M. A., & Alcalá-Quintana, R. (2023). Accuracy and precision of responses to visual analog scales: Inter-and intra-individual variability. Behavior Research Methods, 55(8), 4369-4381. https://doi. org/10.3758/s13428-022-02021-0
Goretzko, D., Pargent, F., Sust, L. N., & Bühner, M. (2019). Not very powerful. European Journal of Psychological Assessment, 36(4), 563- 572. https://doi.org/10.1027/1015-5759/a000539
Goyal, S. (2023). Networks: An economics approach. The MIT Press. Grice, J. W. (2001). Computing and evaluating factor scores. Psychological
Methods, 6(4), 430-450. https://doi.org/10.1037/1082-989X.6.4.430
Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable systems. Structural Equation Modeling: Present and Future, 195(216), 60-70.
Hao, J., von Davier, A., Yaneva, V., Lottridge, S., von Davier, M., & Harris, D. (2024). Transforming assessment: The impacts and implications of large language models and generative AI. Educational Measurement: Issues and Practice. 43(2), 16-29. https://doi.org/10.1111/emip.12602
Henrysson, S. (1971). Gathering, analyzing and using data on test items. In R. L. Thorndike (Ed.), Educational measurement (pp. 130-159). America Council on Education.
Hernández, A., Drasgow, F., & González-Romá, V. (2004). Investigating the functioning of a middle category by means of a mixed-measurement model. Journal of Applied Psychology, 89(4), 687-699. https://doi.org/10.1037/0021-9010.89.4.687
Hernández-Dorado, A., Ferrando, P. J., & Vigil-Colet, A. (2025). The impact and consequences of correcting for acquiescence when correlated residuals are present. Psicothema, 37(1), 11-20. https://doi.org/10.70478/psicothema.2025.37.02
Höhne, J. K., & Krebs, D. (2018). Scale direction effects in agree/ disagree and item-specific questions: A comparison of question formats. International Journal of Social Research Methodology, 21(1), 91-103. https://doi.org/10.1080/13645579.2017.1325566
Hubatka, P., Cígler, H., Elek, D., & Tancoš, M. (2024). The length and verbal anchors do not matter: The influence of various Likert-like response formats on scales’ psychometric properties. PsyArXiv. https://doi.org/10.31234/osf.io/bjs2c
Jebb, A. T., Ng, V., & Tay, L. (2021). A review of key Likert scale development advances: 1995-2019. Frontiers in Psychology, 12, 637547. https://doi.org/10.3389/fpsyg.2021.637547
John, O. P., & Soto, C. J. (2007). The importance of being valid: Reliability and the process of construct validation. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 461–494). The Guilford Press.
Johnson, R. L., & Morgan, G. B. (2016). Survey scales: A guide to development, analysis, and reporting. Guilford Publications.
Krosnick, J.A. (1999). Survey research. Annual Review of Psychology, 50(1), 537-567. https://doi.org/10.1146/annurev.psych.50.1.537
Lee, J., & Paek, I. (2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32(7), 663-673. https://doi.org/10.1177/0734282914522200
Lee S., Whittaker T., & Stapleton L. (2023). GRShiny: Graded Response Model. R package version 1.0.0. cran.r-project.org. https://doi.org/10.32614/CRAN.package.GRShiny
Levitt, H. M., Bamberg, M., Creswell, J. W., Frost, D. M., Josselson, R., & Suárez-Orozco, C. (2018). Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: The APA publications and communications board task force report. American Psychologist, 73, 26-46. https://doi.org/10.1037/amp0000151
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1-55.
Lissitz, R. W., & Green, S. B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60(1), 10-13. https://doi.org/10.1037/h0076268
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635-694.
https://doi.org/10.2466/pr0.1957.3.3.635
Lorenzo-Seva, U., & Ferrando, P. J. (2013). FACTOR 9.2: A comprehensive program for fitting exploratory and semiconfirmatory factor analysis and IRT models. Applied psychological measurement, 37(6), 497-498. https://doi.org/10.1177/0146621613487794
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. IAP.
Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4(2), 73-79. https://doi.org/10.1027/1614-2241.4.2.73
Lucke, J. F. (2005). The α and the ω of congeneric test theory: An extension of reliability and internal consistency to heterogeneous tests. Applied Psychological Measurement, 29(1), 65-81. https://doi.org/10.1177/0146621604270882
Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 73(2), 304-324. https://doi.org/10.1093/poq/nfp023
Mariano, L. T., Phillips, A., Estes, K., & Kilburn, M. R. (2024). Should survey Likert scales include neutral response categories? Evidence from a randomized school climate survey. RAND Corporation. https://doi.org/10.7249/WRA3135-2
Maydeu-Olivares, A., Fairchild, A. J., & Hall, A. G. (2017). Goodness of fit in item factor analysis: Effect of the number of response alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 24(4), 495- 505. https://doi.org/10.1080/10705511.2017.1289816
McIver, J., & Carmines, E. G. (1981). Unidimensional scaling. Sage Publications. https://doi.org/10.4135/9781412986441
McCrae, R. R., Costa Jr., P. T., & Piedmont, R. L. (1993). Folk concepts, natural language, and psychological constructs: The California Psychological Inventory and the five‐factor model. Journal of Personality, 61(1), 1-26. https://doi.org/10.1111/j.1467-6494.1993. tb00276.x
McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24(2), 99-114.
https://doi.org/10.1177/01466210022031552
Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis and application of psychological and educational tests. Eleven International Publishing.
Morales-Vives, F., Ferrando, P. J., & Dueñas, J.-M. (2023). Should suicidal ideation be regarded as a dimension, a unipolar trait or a mixture? A model-based analysis at the score level. Current Psychology, 42(25), 21397-21411. https://doi.org/10.1007/s12144-022-03224-6
Munshi, J. (1990). A method for constructing Likert scales. Sonoma State University. http://munshi.sonoma.edu/likert.html
Muñiz, J. (2018). Introducción a la psicometría [An introduction to psychometrics]. Pirámide.
Muñiz, J., & Fonseca-Pedrero, E. (2019). Ten steps for test development. Psicothema, 31(1), 7-16. https://doi.org/10.7334/psicothema2018.291
Muñiz, J., Garcı́a-Cueto, E., & Lozano, L. M. (2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38(1), 61-69. https://doi.org/10.1016/j.paid.2004.03.021
Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115-132. https://doi.org/10.1007/bf02294210
Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non‐normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171-189. https://doi.org/10.1111/j.2044-8317.1985.tb00832.x
Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.
Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1-15. https://doi.org/10.1016/s0001-6918(99)00050-5
Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12(3), 287-297.
https://doi.org/10.1037//1040-3590.12.3.287
Sancerni, M. D., Meliá, J. L., & González Romá, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11(2), 167-175.
Santamaría, P., & Sánchez, F. (2022). Open questions in the use of new technologies in psychological assessment. Psychological Papers, 43(1), 48-54. https://doi.org/10.23923/pap.psicol.2984
Sireci, S., & Benítez, I. (2023). Evidence for test validation: A guide for practitioners. Psicothema, 35(3), 217-226.
https://doi.org/10.7334/ psicothema2022.477
Sijtsma, K., Ellis, J. L., & Borsboom, D. (2024). Recognize the value of the sum score, psychometrics’ greatest accomplishment. Psychometrika, 89(1), 84-117. https://doi.org/10.1007/s11336-024-09964-7
Sideridis, G., Tsaousis, I., & Ghamdi, H. (2023). Equidistant response options on Likert-type instruments: Testing the interval scaling assumption using Mplus. Educational and Psychological Measurement, 83(5), 885- 906. https://doi.org/10.1177/00131644221130482
Spector, P. E. (1992). Summated rating scale construction: an introduction.
Sage Publications. https://doi.org/10.4135/9781412986038
Speer, A. B., Robie, C., & Christiansen, N. D. (2016). Effects of item type and estimation method on the accuracy of estimated personality trait scores: Polytomous item response theory models versus summated scoring. Personality and Individual Differences, 102, 41-45.
https://doi.org/10.1016/j.paid.2016.06.058
Suárez-Álvarez, J., Pedrosa, I., Lozano, L. M., García-Cueto, E., Cuesta Izquierdo, M., & Muñiz, J. (2018). Using reversed items in Likert scales: A questionable practice. Psicothema, 2(30), 149-158. https://doi.org/10.7334/psicothema2018.33
Tay, L., & Jebb, A. T. (2018). Establishing construct continua in construct validation: The process of continuum specification. Advances in Methods and Practices in Psychological Science, 1(3), 375-388. https://doi.org/10.1177/2515245918775707
Tomás, J. M., & Oliver, A. (1998). Response format and method of estimation effects on confirmatory factor analysis. Psicothema, 10(1), 197-208.
Torgerson, W. S. (1958). Theory and methods of scaling. Wiley.
Uebersax, J. S. (2006). Likert scales: dispelling the confusion. Statistical Methods for Rater Agreement.
https://john-uebersax.com/stat/likert.htm Vigil-Colet, A., Navarro-González, D., & Morales-Vives, F. (2020). To reverse or to not reverse Likert-type items: That is the question. Psicothema, 32(1), 108-114. https://doi.org/10.7334/psicothema2019.286
Wainer, H. (1993). Measurement problems. Journal of Educational Measurement, 30(1), 1-21.
https://doi.org/10.1111/j.1745-3984.1993.tb00419.x
Wainer, H. (1976). Estimating coefficients in linear models: It don’t make no nevermind. Psychological Bulletin, 83(2), 213-217. https://doi.org/10.1037//0033-2909.83.2.213