Abstract
Background: Likert-type scales, first introduced by Rensis Likert in 1932, have become one of the most widely used assessment tools across a range of scientific and professional domains, owing to their simplicity and effectiveness. The purpose of the present study is to critically review their use and to propose a set of practical guidelines aimed at optimizing their construction, analysis, and application. Method: A systematic literature review of guidelines focused on the development, analysis, scoring, use, and interpretation of Likert scales was carried out. Results: Several key areas for improvement in the construction and use of Likert-type scales were identified, including the operational definition of constructs, item formulation, selection of the number of response categories, response analysis, collection of validity evidence, item calibration, and score interpretation. Conclusions: Based on the findings, a practical guide comprising fifteen recommendations is proposed: ten focused on the appropriate design, construction, and analysis of Likert scales, and five aimed at guiding appropriate use of pre-existing scales by researchers and practitioners.
References
Addams, C. (1982, November 29). Would you say Attila is doing an excellent job, a good job, a fair job, or a poor job? The New Yorker. https://www.newyorker.com/magazine/1982/11/29/a-normal-tuesday
American Educational Research Association [AERA] (2014). Standards for educational and psychological testing. American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education.
Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in organizational research. Organizational Research Methods, 1(1), 45-87. https://doi.org/10.1177/109442819800100104
Bass, B. M., Cascio, W. F., & O’Connor, E. J. (1974). Magnitude estimations of expressions of frequency and amount. Journal of Applied Psychology, 59(3), 313-320. https://doi.org/10.1037/h0036653
Benson, N., Kranzler, J. H., & Floyd, R. G. (2020). Exploratory and confirmatory factor analysis of the Universal Nonverbal Intelligence Test - Second Edition: Testing dimensionality and invariance across age, gender, race, and ethnicity. Assessment, 27(5), 996-1006. https://doi.org/10.1177/1073191118786584
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons. https://doi.org/10.1002/9781118619179
Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305- 314. https://doi.org/10.1037//0033-2909.110.2.305
Bollen, K. A., & Long. J. S. (1993). Testing structural equation models. Sage.
Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892-895.
https://doi.org/10.1126/science.1165821
Borsboom, D. (2017). A network theory of mental disorders. World Psychiatry, 16(1), 5-13. https://doi.org/10.1002/wps.20375
Borsboom, D. (2022). Possible futures for network psychometrics. Psychometrika, 87(1), 253-265. https://doi.org/10.1007/S11336-022-09851-Z
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791-799. https://doi.org/10.2307/2286841
Brown, G. T. L., & Zhao, A. (2023). In defense of psychometric measurement: A systematic review of contemporary self-report feedback inventories. Educational Psychologist, 58(3), 178-192. https://doi.org/10.1080/00461520.2023.2208670
Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44(1), 108-132. https://doi.org/10.1006/jmps.1999.1279 Burisch, M. (1984). Approaches to personality inventory construction: A
comparison of merits. American Psychologist, 39(3), 214-227. https://doi.org/10.1037//0003-066x.39.3.214
Buss, D. M., & Craik, K. H. (1981). The act frequency analysis of interpersonal dispositions: Aloofness, gregariousness, dominance and submissiveness. Journal of Personality, 49(2), 175-192. https://doi.org/10.1111/j.1467-6494.1981.tb00736.x
Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106-116. https://doi.org/10.3844/jssp.2007.106.116
Casas, J. M., Dueñas, J.-M., Ferrando, P. J., Castarlenas, E., Vigil-Colet, A., Hernández-Navarro, J. C., & Morales-Vives, F. (2025). Measuring the callous-unemotional traits in juvenile offenders: Properties and Functioning of the INCA Questionnaire in This Population. Psychiatry, Psychology and Law, 1-22. https://doi.org/10.1080/13218719.2025.24 97785
Casper, W. C., Edwards, B. D., Wallace, J. C., Landis, R. S., & Fife, D. A. (2020). Selecting response anchors with equal intervals for summated rating scales. Journal of Applied Psychology, 105(4), 390-409. https://doi.org/10.1037/apl0000444
Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412-1427. https://doi.org/10.1037/pas0000626
Comrey, A. L. (1988). Factor-analytic methods of scale development in personality and clinical psychology. Journal of Consulting and Clinical Psychology, 56(5), 754-761. https://doi.org/10.1037//0022-006x.56.5.754
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Psychology Press.
Cooper, C. (2019). Pitfalls of personality theory. Personality and Individual Differences, 151, 109551. https://doi.org/10.1016/j.paid.2019.109551 Cronbach, L. J. (1950). Further evidence on response sets and test design. Educational and Psychological Measurement, 10(1), 3-31. https://doi.org/10.1177/001316445001000101
Cronbach,L.J.,&Gleser,G.C.(1964).Thesignal/noiseratiointhecomparison of reliability coefficients. Educational and Psychological Measurement, 24(3), 467-480. https://doi.org/10.1177/001316446402400303
Cronbach, L.J., & Warrington, W. G. (1952). Efficiency of multiple-choice tests as a function of spread of item difficulties. Psychometrika, 17(2), 127–147. https://doi.org/10.1007/bf02288778
De la Fuente, D., & Armayones, M. (2025). AI in psychological practice: what tools are available and how can they help in clinical psychology? Psychologist Papers, 46(1), 18-24. https://doi.org/10.70478/pap. psicol.2025.46.03
DeVellis, R. F. (2003). Scale development: Theory and applications. Sage Publications.
DuBois, B., & Burns, J. A. (1975). An analysis of the meaning of the question mark response category in attitude scales. Educational and Psychological Measurement, 35(4), 869-884. https://doi.org/10.1177/001316447503500414
Elosua, P., Aguado, D., Fonseca-Pedrero, E., Abad, F. J., & Santamaría,
P. (2023). New trends in digital technology-based psychological and educational assessment. Psicothema, 35, 50-57.
https://doi.org/10.7334/ psicothema2022.241
Ferrando, P. J. (2021). Seven decades of factor analysis: From Yela to the present day. Psicothema, 33(3), 378-375.
https://doi.org/10.7334/psicothema2021.24
Ferrando, P. J. (2003). A Kernel density analysis of continuous typical- response scales. Educational and Psychological Measurement, 63(5), 809-824. https://doi.org/10.1177/0013164403251323
Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2009). A general factor- analytic procedure for assessing response bias in questionnaire measures. Structural Equation Modeling: A Multidisciplinary Journal, 16(2), 364-381. https://doi.org/10.1080/1070551090275137
Ferrando P. J., Lorenzo-Seva, U., Hernández-Dorado A., & Muñiz, J. (2022). Decalogue for the factor analysis of test items. Psicothema, 34(1), 7-17. https://doi.org/10.7334/psicothema2021.456
Ferrando, P. J., & Morales-Vives, F. (2023). Is it quality, is it redundancy, or is model inadequacy? Some strategies for judging the appropriateness of high-discrimination items. Anales de Psicología, 39(3), 517-527. https://doi.org/10.6018/analesps.535781
Ferrando, P. J., Navarro-González, D., & Lorenzo-Seva, U. (2024). A relative normed effect-size difference index for determining the number of common factors in exploratory solutions. Educational and Psychological Measurement, 84(4), 736-752.
https://doi.org/10.1177/00131644231196482
Fink, A. (2003). The survey handbook. Sage. https://doi.org/10.4135/9781412986328
Fiske, S. T., & Taylor, S. E. (2020). Social cognition evolves: Illustrations from our work on intergroup bias and on healthy adaptation. Psicothema, 32(3), 291-297. https://doi.org/10.7334/psicothema2020.197
Fonseca, E. (2018). Network analysis in psychology. Papeles del Psicólogo, 39(1), 1-12. https://doi.org/10.23923/pap.psicol2018.2852
Fonseca, E., Falcó, R., Al-Halabí, S., & Muñiz, J. (2025). Evaluación de la salud mental en contextos educativos Mental health assessment in educational settings. In E. Fonseca, & S. Al-Halabí (Eds.), Salud mental en contextos educativos (pp. 181-235). Editorial Pirámide.
Fonseca, E., & Muñiz, J. (2025). Análisis de Redes en la Medición Psicológica: Fundamentos [Network Analysis in Psychological Measurement: Fundamentals]. Acción Psicológica, 22(1), 87-100. https://doi.org/10.5944/ap.22.1.43296
Frary, R. B. (2003). A brief guide to questionnaire development. Virginia Polytechnic Institute & State University. https://medrescon.tripod.com/ questionnaire.pdf
Furnham, A. (1990). The development of single trait personality theories. Personality and Individual Differences, 11(9), 923-929.
https://doi.org/10.1016/0191-8869(90)90273-t
García-Pérez, M. A., & Alcalá-Quintana, R. (2023). Accuracy and precision of responses to visual analog scales: Inter-and intra-individual variability. Behavior Research Methods, 55(8), 4369-4381. https://doi. org/10.3758/s13428-022-02021-0
Goretzko, D., Pargent, F., Sust, L. N., & Bühner, M. (2019). Not very powerful. European Journal of Psychological Assessment, 36(4), 563- 572. https://doi.org/10.1027/1015-5759/a000539
Goyal, S. (2023). Networks: An economics approach. The MIT Press. Grice, J. W. (2001). Computing and evaluating factor scores. Psychological
Methods, 6(4), 430-450. https://doi.org/10.1037/1082-989X.6.4.430
Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable systems. Structural Equation Modeling: Present and Future, 195(216), 60-70.
Hao, J., von Davier, A., Yaneva, V., Lottridge, S., von Davier, M., & Harris, D. (2024). Transforming assessment: The impacts and implications of large language models and generative AI. Educational Measurement: Issues and Practice. 43(2), 16-29. https://doi.org/10.1111/emip.12602
Henrysson, S. (1971). Gathering, analyzing and using data on test items. In R. L. Thorndike (Ed.), Educational measurement (pp. 130-159). America Council on Education.
Hernández, A., Drasgow, F., & González-Romá, V. (2004). Investigating the functioning of a middle category by means of a mixed-measurement model. Journal of Applied Psychology, 89(4), 687-699. https://doi.org/10.1037/0021-9010.89.4.687
Hernández-Dorado, A., Ferrando, P. J., & Vigil-Colet, A. (2025). The impact and consequences of correcting for acquiescence when correlated residuals are present. Psicothema, 37(1), 11-20. https://doi.org/10.70478/psicothema.2025.37.02
Höhne, J. K., & Krebs, D. (2018). Scale direction effects in agree/ disagree and item-specific questions: A comparison of question formats. International Journal of Social Research Methodology, 21(1), 91-103. https://doi.org/10.1080/13645579.2017.1325566
Hubatka, P., Cígler, H., Elek, D., & Tancoš, M. (2024). The length and verbal anchors do not matter: The influence of various Likert-like response formats on scales’ psychometric properties. PsyArXiv. https://doi.org/10.31234/osf.io/bjs2c
Jebb, A. T., Ng, V., & Tay, L. (2021). A review of key Likert scale development advances: 1995-2019. Frontiers in Psychology, 12, 637547. https://doi.org/10.3389/fpsyg.2021.637547
John, O. P., & Soto, C. J. (2007). The importance of being valid: Reliability and the process of construct validation. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 461–494). The Guilford Press.
Johnson, R. L., & Morgan, G. B. (2016). Survey scales: A guide to development, analysis, and reporting. Guilford Publications.
Krosnick, J.A. (1999). Survey research. Annual Review of Psychology, 50(1), 537-567. https://doi.org/10.1146/annurev.psych.50.1.537
Lee, J., & Paek, I. (2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32(7), 663-673. https://doi.org/10.1177/0734282914522200
Lee S., Whittaker T., & Stapleton L. (2023). GRShiny: Graded Response Model. R package version 1.0.0. cran.r-project.org. https://doi.org/10.32614/CRAN.package.GRShiny
Levitt, H. M., Bamberg, M., Creswell, J. W., Frost, D. M., Josselson, R., & Suárez-Orozco, C. (2018). Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: The APA publications and communications board task force report. American Psychologist, 73, 26-46. https://doi.org/10.1037/amp0000151
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1-55.
Lissitz, R. W., & Green, S. B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60(1), 10-13. https://doi.org/10.1037/h0076268
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635-694.
https://doi.org/10.2466/pr0.1957.3.3.635
Lorenzo-Seva, U., & Ferrando, P. J. (2013). FACTOR 9.2: A comprehensive program for fitting exploratory and semiconfirmatory factor analysis and IRT models. Applied psychological measurement, 37(6), 497-498. https://doi.org/10.1177/0146621613487794
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. IAP.
Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4(2), 73-79. https://doi.org/10.1027/1614-2241.4.2.73
Lucke, J. F. (2005). The α and the ω of congeneric test theory: An extension of reliability and internal consistency to heterogeneous tests. Applied Psychological Measurement, 29(1), 65-81. https://doi.org/10.1177/0146621604270882
Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 73(2), 304-324. https://doi.org/10.1093/poq/nfp023
Mariano, L. T., Phillips, A., Estes, K., & Kilburn, M. R. (2024). Should survey Likert scales include neutral response categories? Evidence from a randomized school climate survey. RAND Corporation. https://doi.org/10.7249/WRA3135-2
Maydeu-Olivares, A., Fairchild, A. J., & Hall, A. G. (2017). Goodness of fit in item factor analysis: Effect of the number of response alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 24(4), 495- 505. https://doi.org/10.1080/10705511.2017.1289816
McIver, J., & Carmines, E. G. (1981). Unidimensional scaling. Sage Publications. https://doi.org/10.4135/9781412986441
McCrae, R. R., Costa Jr., P. T., & Piedmont, R. L. (1993). Folk concepts, natural language, and psychological constructs: The California Psychological Inventory and the five‐factor model. Journal of Personality, 61(1), 1-26. https://doi.org/10.1111/j.1467-6494.1993. tb00276.x
McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24(2), 99-114.
https://doi.org/10.1177/01466210022031552
Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis and application of psychological and educational tests. Eleven International Publishing.
Morales-Vives, F., Ferrando, P. J., & Dueñas, J.-M. (2023). Should suicidal ideation be regarded as a dimension, a unipolar trait or a mixture? A model-based analysis at the score level. Current Psychology, 42(25), 21397-21411. https://doi.org/10.1007/s12144-022-03224-6
Munshi, J. (1990). A method for constructing Likert scales. Sonoma State University. http://munshi.sonoma.edu/likert.html
Muñiz, J. (2018). Introducción a la psicometría [An introduction to psychometrics]. Pirámide.
Muñiz, J., & Fonseca-Pedrero, E. (2019). Ten steps for test development. Psicothema, 31(1), 7-16. https://doi.org/10.7334/psicothema2018.291
Muñiz, J., Garcı́a-Cueto, E., & Lozano, L. M. (2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38(1), 61-69. https://doi.org/10.1016/j.paid.2004.03.021
Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115-132. https://doi.org/10.1007/bf02294210
Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non‐normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171-189. https://doi.org/10.1111/j.2044-8317.1985.tb00832.x
Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.
Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1-15. https://doi.org/10.1016/s0001-6918(99)00050-5
Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12(3), 287-297.
https://doi.org/10.1037//1040-3590.12.3.287
Sancerni, M. D., Meliá, J. L., & González Romá, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11(2), 167-175.
Santamaría, P., & Sánchez, F. (2022). Open questions in the use of new technologies in psychological assessment. Psychological Papers, 43(1), 48-54. https://doi.org/10.23923/pap.psicol.2984
Sireci, S., & Benítez, I. (2023). Evidence for test validation: A guide for practitioners. Psicothema, 35(3), 217-226.
https://doi.org/10.7334/ psicothema2022.477
Sijtsma, K., Ellis, J. L., & Borsboom, D. (2024). Recognize the value of the sum score, psychometrics’ greatest accomplishment. Psychometrika, 89(1), 84-117. https://doi.org/10.1007/s11336-024-09964-7
Sideridis, G., Tsaousis, I., & Ghamdi, H. (2023). Equidistant response options on Likert-type instruments: Testing the interval scaling assumption using Mplus. Educational and Psychological Measurement, 83(5), 885- 906. https://doi.org/10.1177/00131644221130482
Spector, P. E. (1992). Summated rating scale construction: an introduction.
Sage Publications. https://doi.org/10.4135/9781412986038
Speer, A. B., Robie, C., & Christiansen, N. D. (2016). Effects of item type and estimation method on the accuracy of estimated personality trait scores: Polytomous item response theory models versus summated scoring. Personality and Individual Differences, 102, 41-45.
https://doi.org/10.1016/j.paid.2016.06.058
Suárez-Álvarez, J., Pedrosa, I., Lozano, L. M., García-Cueto, E., Cuesta Izquierdo, M., & Muñiz, J. (2018). Using reversed items in Likert scales: A questionable practice. Psicothema, 2(30), 149-158. https://doi.org/10.7334/psicothema2018.33
Tay, L., & Jebb, A. T. (2018). Establishing construct continua in construct validation: The process of continuum specification. Advances in Methods and Practices in Psychological Science, 1(3), 375-388. https://doi.org/10.1177/2515245918775707
Tomás, J. M., & Oliver, A. (1998). Response format and method of estimation effects on confirmatory factor analysis. Psicothema, 10(1), 197-208.
Torgerson, W. S. (1958). Theory and methods of scaling. Wiley.
Uebersax, J. S. (2006). Likert scales: dispelling the confusion. Statistical Methods for Rater Agreement.
https://john-uebersax.com/stat/likert.htm Vigil-Colet, A., Navarro-González, D., & Morales-Vives, F. (2020). To reverse or to not reverse Likert-type items: That is the question. Psicothema, 32(1), 108-114. https://doi.org/10.7334/psicothema2019.286
Wainer, H. (1993). Measurement problems. Journal of Educational Measurement, 30(1), 1-21.
https://doi.org/10.1111/j.1745-3984.1993.tb00419.x
Wainer, H. (1976). Estimating coefficients in linear models: It don’t make no nevermind. Psychological Bulletin, 83(2), 213-217. https://doi.org/10.1037//0033-2909.83.2.213