Likert Scales: A Practical Guide to Design, Construction and Use
PDF

Keywords

Likert
Guidelines
Scales
Psychometrics Likert
Directrices
Escalas
Psicometría

How to Cite

Ferrando, P. J., Morales-Vives, F., Casas, J. M., & Muñiz, J. (2025). Likert Scales: A Practical Guide to Design, Construction and Use. Psicothema, 37(4), 1–15. Retrieved from https://reunido.uniovi.es/index.php/PST/article/view/23856

Abstract

Background: Likert-type scales, first introduced by Rensis Likert in 1932, have become one of the most widely used assessment tools across a range of scientific and professional domains, owing to their simplicity and effectiveness. The purpose of the present study is to critically review their use and to propose a set of practical guidelines aimed at optimizing their construction, analysis, and application. Method: A systematic literature review of guidelines focused on the development, analysis, scoring, use, and interpretation of Likert scales was carried out. Results: Several key areas for improvement in the construction and use of Likert-type scales were identified, including the operational definition of constructs, item formulation, selection of the number of response categories, response analysis, collection of validity evidence, item calibration, and score interpretation. Conclusions: Based on the findings, a practical guide comprising fifteen recommendations is proposed: ten focused on the appropriate design, construction, and analysis of Likert scales, and five aimed at guiding appropriate use of pre-existing scales by researchers and practitioners.

PDF

References

Addams, C. (1982, November 29). Would you say Attila is doing an excellent job, a good job, a fair job, or a poor job? The New Yorker. https://www.newyorker.com/magazine/1982/11/29/a-normal-tuesday

American Educational Research Association [AERA] (2014). Standards for educational and psychological testing. American Educational Research Association, American Psychological Association, and the National Council on Measurement in Education.

Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in organizational research. Organizational Research Methods, 1(1), 45-87. https://doi.org/10.1177/109442819800100104

Bass, B. M., Cascio, W. F., & O’Connor, E. J. (1974). Magnitude estimations of expressions of frequency and amount. Journal of Applied Psychology, 59(3), 313-320. https://doi.org/10.1037/h0036653

Benson, N., Kranzler, J. H., & Floyd, R. G. (2020). Exploratory and confirmatory factor analysis of the Universal Nonverbal Intelligence Test - Second Edition: Testing dimensionality and invariance across age, gender, race, and ethnicity. Assessment, 27(5), 996-1006. https://doi.org/10.1177/1073191118786584

Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons. https://doi.org/10.1002/9781118619179

Bollen, K., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305- 314. https://doi.org/10.1037//0033-2909.110.2.305

Bollen, K. A., & Long. J. S. (1993). Testing structural equation models. Sage.

Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892-895.

https://doi.org/10.1126/science.1165821

Borsboom, D. (2017). A network theory of mental disorders. World Psychiatry, 16(1), 5-13. https://doi.org/10.1002/wps.20375

Borsboom, D. (2022). Possible futures for network psychometrics. Psychometrika, 87(1), 253-265. https://doi.org/10.1007/S11336-022-09851-Z

Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791-799. https://doi.org/10.2307/2286841

Brown, G. T. L., & Zhao, A. (2023). In defense of psychometric measurement: A systematic review of contemporary self-report feedback inventories. Educational Psychologist, 58(3), 178-192. https://doi.org/10.1080/00461520.2023.2208670

Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44(1), 108-132. https://doi.org/10.1006/jmps.1999.1279 Burisch, M. (1984). Approaches to personality inventory construction: A

comparison of merits. American Psychologist, 39(3), 214-227. https://doi.org/10.1037//0003-066x.39.3.214

Buss, D. M., & Craik, K. H. (1981). The act frequency analysis of interpersonal dispositions: Aloofness, gregariousness, dominance and submissiveness. Journal of Personality, 49(2), 175-192. https://doi.org/10.1111/j.1467-6494.1981.tb00736.x

Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106-116. https://doi.org/10.3844/jssp.2007.106.116

Casas, J. M., Dueñas, J.-M., Ferrando, P. J., Castarlenas, E., Vigil-Colet, A., Hernández-Navarro, J. C., & Morales-Vives, F. (2025). Measuring the callous-unemotional traits in juvenile offenders: Properties and Functioning of the INCA Questionnaire in This Population. Psychiatry, Psychology and Law, 1-22. https://doi.org/10.1080/13218719.2025.24 97785

Casper, W. C., Edwards, B. D., Wallace, J. C., Landis, R. S., & Fife, D. A. (2020). Selecting response anchors with equal intervals for summated rating scales. Journal of Applied Psychology, 105(4), 390-409. https://doi.org/10.1037/apl0000444

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412-1427. https://doi.org/10.1037/pas0000626

Comrey, A. L. (1988). Factor-analytic methods of scale development in personality and clinical psychology. Journal of Consulting and Clinical Psychology, 56(5), 754-761. https://doi.org/10.1037//0022-006x.56.5.754

Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Psychology Press.

Cooper, C. (2019). Pitfalls of personality theory. Personality and Individual Differences, 151, 109551. https://doi.org/10.1016/j.paid.2019.109551 Cronbach, L. J. (1950). Further evidence on response sets and test design. Educational and Psychological Measurement, 10(1), 3-31. https://doi.org/10.1177/001316445001000101

Cronbach,L.J.,&Gleser,G.C.(1964).Thesignal/noiseratiointhecomparison of reliability coefficients. Educational and Psychological Measurement, 24(3), 467-480. https://doi.org/10.1177/001316446402400303

Cronbach, L.J., & Warrington, W. G. (1952). Efficiency of multiple-choice tests as a function of spread of item difficulties. Psychometrika, 17(2), 127–147. https://doi.org/10.1007/bf02288778

De la Fuente, D., & Armayones, M. (2025). AI in psychological practice: what tools are available and how can they help in clinical psychology? Psychologist Papers, 46(1), 18-24. https://doi.org/10.70478/pap. psicol.2025.46.03

DeVellis, R. F. (2003). Scale development: Theory and applications. Sage Publications.

DuBois, B., & Burns, J. A. (1975). An analysis of the meaning of the question mark response category in attitude scales. Educational and Psychological Measurement, 35(4), 869-884. https://doi.org/10.1177/001316447503500414

Elosua, P., Aguado, D., Fonseca-Pedrero, E., Abad, F. J., & Santamaría,

P. (2023). New trends in digital technology-based psychological and educational assessment. Psicothema, 35, 50-57.

https://doi.org/10.7334/ psicothema2022.241

Ferrando, P. J. (2021). Seven decades of factor analysis: From Yela to the present day. Psicothema, 33(3), 378-375.

https://doi.org/10.7334/psicothema2021.24

Ferrando, P. J. (2003). A Kernel density analysis of continuous typical- response scales. Educational and Psychological Measurement, 63(5), 809-824. https://doi.org/10.1177/0013164403251323

Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2009). A general factor- analytic procedure for assessing response bias in questionnaire measures. Structural Equation Modeling: A Multidisciplinary Journal, 16(2), 364-381. https://doi.org/10.1080/1070551090275137

Ferrando P. J., Lorenzo-Seva, U., Hernández-Dorado A., & Muñiz, J. (2022). Decalogue for the factor analysis of test items. Psicothema, 34(1), 7-17. https://doi.org/10.7334/psicothema2021.456

Ferrando, P. J., & Morales-Vives, F. (2023). Is it quality, is it redundancy, or is model inadequacy? Some strategies for judging the appropriateness of high-discrimination items. Anales de Psicología, 39(3), 517-527. https://doi.org/10.6018/analesps.535781

Ferrando, P. J., Navarro-González, D., & Lorenzo-Seva, U. (2024). A relative normed effect-size difference index for determining the number of common factors in exploratory solutions. Educational and Psychological Measurement, 84(4), 736-752.

https://doi.org/10.1177/00131644231196482

Fink, A. (2003). The survey handbook. Sage. https://doi.org/10.4135/9781412986328

Fiske, S. T., & Taylor, S. E. (2020). Social cognition evolves: Illustrations from our work on intergroup bias and on healthy adaptation. Psicothema, 32(3), 291-297. https://doi.org/10.7334/psicothema2020.197

Fonseca, E. (2018). Network analysis in psychology. Papeles del Psicólogo, 39(1), 1-12. https://doi.org/10.23923/pap.psicol2018.2852

Fonseca, E., Falcó, R., Al-Halabí, S., & Muñiz, J. (2025). Evaluación de la salud mental en contextos educativos Mental health assessment in educational settings. In E. Fonseca, & S. Al-Halabí (Eds.), Salud mental en contextos educativos (pp. 181-235). Editorial Pirámide.

Fonseca, E., & Muñiz, J. (2025). Análisis de Redes en la Medición Psicológica: Fundamentos [Network Analysis in Psychological Measurement: Fundamentals]. Acción Psicológica, 22(1), 87-100. https://doi.org/10.5944/ap.22.1.43296

Frary, R. B. (2003). A brief guide to questionnaire development. Virginia Polytechnic Institute & State University. https://medrescon.tripod.com/ questionnaire.pdf

Furnham, A. (1990). The development of single trait personality theories. Personality and Individual Differences, 11(9), 923-929.

https://doi.org/10.1016/0191-8869(90)90273-t

García-Pérez, M. A., & Alcalá-Quintana, R. (2023). Accuracy and precision of responses to visual analog scales: Inter-and intra-individual variability. Behavior Research Methods, 55(8), 4369-4381. https://doi. org/10.3758/s13428-022-02021-0

Goretzko, D., Pargent, F., Sust, L. N., & Bühner, M. (2019). Not very powerful. European Journal of Psychological Assessment, 36(4), 563- 572. https://doi.org/10.1027/1015-5759/a000539

Goyal, S. (2023). Networks: An economics approach. The MIT Press. Grice, J. W. (2001). Computing and evaluating factor scores. Psychological

Methods, 6(4), 430-450. https://doi.org/10.1037/1082-989X.6.4.430

Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct reliability within latent variable systems. Structural Equation Modeling: Present and Future, 195(216), 60-70.

Hao, J., von Davier, A., Yaneva, V., Lottridge, S., von Davier, M., & Harris, D. (2024). Transforming assessment: The impacts and implications of large language models and generative AI. Educational Measurement: Issues and Practice. 43(2), 16-29. https://doi.org/10.1111/emip.12602

Henrysson, S. (1971). Gathering, analyzing and using data on test items. In R. L. Thorndike (Ed.), Educational measurement (pp. 130-159). America Council on Education.

Hernández, A., Drasgow, F., & González-Romá, V. (2004). Investigating the functioning of a middle category by means of a mixed-measurement model. Journal of Applied Psychology, 89(4), 687-699. https://doi.org/10.1037/0021-9010.89.4.687

Hernández-Dorado, A., Ferrando, P. J., & Vigil-Colet, A. (2025). The impact and consequences of correcting for acquiescence when correlated residuals are present. Psicothema, 37(1), 11-20. https://doi.org/10.70478/psicothema.2025.37.02

Höhne, J. K., & Krebs, D. (2018). Scale direction effects in agree/ disagree and item-specific questions: A comparison of question formats. International Journal of Social Research Methodology, 21(1), 91-103. https://doi.org/10.1080/13645579.2017.1325566

Hubatka, P., Cígler, H., Elek, D., & Tancoš, M. (2024). The length and verbal anchors do not matter: The influence of various Likert-like response formats on scales’ psychometric properties. PsyArXiv. https://doi.org/10.31234/osf.io/bjs2c

Jebb, A. T., Ng, V., & Tay, L. (2021). A review of key Likert scale development advances: 1995-2019. Frontiers in Psychology, 12, 637547. https://doi.org/10.3389/fpsyg.2021.637547

John, O. P., & Soto, C. J. (2007). The importance of being valid: Reliability and the process of construct validation. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 461–494). The Guilford Press.

Johnson, R. L., & Morgan, G. B. (2016). Survey scales: A guide to development, analysis, and reporting. Guilford Publications.

Krosnick, J.A. (1999). Survey research. Annual Review of Psychology, 50(1), 537-567. https://doi.org/10.1146/annurev.psych.50.1.537

Lee, J., & Paek, I. (2014). In search of the optimal number of response categories in a rating scale. Journal of Psychoeducational Assessment, 32(7), 663-673. https://doi.org/10.1177/0734282914522200

Lee S., Whittaker T., & Stapleton L. (2023). GRShiny: Graded Response Model. R package version 1.0.0. cran.r-project.org. https://doi.org/10.32614/CRAN.package.GRShiny

Levitt, H. M., Bamberg, M., Creswell, J. W., Frost, D. M., Josselson, R., & Suárez-Orozco, C. (2018). Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: The APA publications and communications board task force report. American Psychologist, 73, 26-46. https://doi.org/10.1037/amp0000151

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), 1-55.

Lissitz, R. W., & Green, S. B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60(1), 10-13. https://doi.org/10.1037/h0076268

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635-694.

https://doi.org/10.2466/pr0.1957.3.3.635

Lorenzo-Seva, U., & Ferrando, P. J. (2013). FACTOR 9.2: A comprehensive program for fitting exploratory and semiconfirmatory factor analysis and IRT models. Applied psychological measurement, 37(6), 497-498. https://doi.org/10.1177/0146621613487794

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. IAP.

Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4(2), 73-79. https://doi.org/10.1027/1614-2241.4.2.73

Lucke, J. F. (2005). The α and the ω of congeneric test theory: An extension of reliability and internal consistency to heterogeneous tests. Applied Psychological Measurement, 29(1), 65-81. https://doi.org/10.1177/0146621604270882

Malhotra, N., Krosnick, J. A., & Thomas, R. K. (2009). Optimal design of branching questions to measure bipolar constructs. Public Opinion Quarterly, 73(2), 304-324. https://doi.org/10.1093/poq/nfp023

Mariano, L. T., Phillips, A., Estes, K., & Kilburn, M. R. (2024). Should survey Likert scales include neutral response categories? Evidence from a randomized school climate survey. RAND Corporation. https://doi.org/10.7249/WRA3135-2

Maydeu-Olivares, A., Fairchild, A. J., & Hall, A. G. (2017). Goodness of fit in item factor analysis: Effect of the number of response alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 24(4), 495- 505. https://doi.org/10.1080/10705511.2017.1289816

McIver, J., & Carmines, E. G. (1981). Unidimensional scaling. Sage Publications. https://doi.org/10.4135/9781412986441

McCrae, R. R., Costa Jr., P. T., & Piedmont, R. L. (1993). Folk concepts, natural language, and psychological constructs: The California Psychological Inventory and the five‐factor model. Journal of Personality, 61(1), 1-26. https://doi.org/10.1111/j.1467-6494.1993. tb00276.x

McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24(2), 99-114.

https://doi.org/10.1177/01466210022031552

Mellenbergh, G. J. (2011). A conceptual introduction to psychometrics: Development, analysis and application of psychological and educational tests. Eleven International Publishing.

Morales-Vives, F., Ferrando, P. J., & Dueñas, J.-M. (2023). Should suicidal ideation be regarded as a dimension, a unipolar trait or a mixture? A model-based analysis at the score level. Current Psychology, 42(25), 21397-21411. https://doi.org/10.1007/s12144-022-03224-6

Munshi, J. (1990). A method for constructing Likert scales. Sonoma State University. http://munshi.sonoma.edu/likert.html

Muñiz, J. (2018). Introducción a la psicometría [An introduction to psychometrics]. Pirámide.

Muñiz, J., & Fonseca-Pedrero, E. (2019). Ten steps for test development. Psicothema, 31(1), 7-16. https://doi.org/10.7334/psicothema2018.291

Muñiz, J., Garcı́a-Cueto, E., & Lozano, L. M. (2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38(1), 61-69. https://doi.org/10.1016/j.paid.2004.03.021

Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49(1), 115-132. https://doi.org/10.1007/bf02294210

Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non‐normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171-189. https://doi.org/10.1111/j.2044-8317.1985.tb00832.x

Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.

Nunnally, J. C. (1978). Psychometric theory. McGraw-Hill.

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1-15. https://doi.org/10.1016/s0001-6918(99)00050-5

Reise, S. P., Waller, N. G., & Comrey, A. L. (2000). Factor analysis and scale revision. Psychological Assessment, 12(3), 287-297.

https://doi.org/10.1037//1040-3590.12.3.287

Sancerni, M. D., Meliá, J. L., & González Romá, V. (1990). Formato de respuesta, fiabilidad y validez, en la medición del conflicto de rol [Response format, reliability, and validity in the measurement of role conflict]. Psicológica, 11(2), 167-175.

Santamaría, P., & Sánchez, F. (2022). Open questions in the use of new technologies in psychological assessment. Psychological Papers, 43(1), 48-54. https://doi.org/10.23923/pap.psicol.2984

Sireci, S., & Benítez, I. (2023). Evidence for test validation: A guide for practitioners. Psicothema, 35(3), 217-226.

https://doi.org/10.7334/ psicothema2022.477

Sijtsma, K., Ellis, J. L., & Borsboom, D. (2024). Recognize the value of the sum score, psychometrics’ greatest accomplishment. Psychometrika, 89(1), 84-117. https://doi.org/10.1007/s11336-024-09964-7

Sideridis, G., Tsaousis, I., & Ghamdi, H. (2023). Equidistant response options on Likert-type instruments: Testing the interval scaling assumption using Mplus. Educational and Psychological Measurement, 83(5), 885- 906. https://doi.org/10.1177/00131644221130482

Spector, P. E. (1992). Summated rating scale construction: an introduction.

Sage Publications. https://doi.org/10.4135/9781412986038

Speer, A. B., Robie, C., & Christiansen, N. D. (2016). Effects of item type and estimation method on the accuracy of estimated personality trait scores: Polytomous item response theory models versus summated scoring. Personality and Individual Differences, 102, 41-45.

https://doi.org/10.1016/j.paid.2016.06.058

Suárez-Álvarez, J., Pedrosa, I., Lozano, L. M., García-Cueto, E., Cuesta Izquierdo, M., & Muñiz, J. (2018). Using reversed items in Likert scales: A questionable practice. Psicothema, 2(30), 149-158. https://doi.org/10.7334/psicothema2018.33

Tay, L., & Jebb, A. T. (2018). Establishing construct continua in construct validation: The process of continuum specification. Advances in Methods and Practices in Psychological Science, 1(3), 375-388. https://doi.org/10.1177/2515245918775707

Tomás, J. M., & Oliver, A. (1998). Response format and method of estimation effects on confirmatory factor analysis. Psicothema, 10(1), 197-208.

Torgerson, W. S. (1958). Theory and methods of scaling. Wiley.

Uebersax, J. S. (2006). Likert scales: dispelling the confusion. Statistical Methods for Rater Agreement.

https://john-uebersax.com/stat/likert.htm Vigil-Colet, A., Navarro-González, D., & Morales-Vives, F. (2020). To reverse or to not reverse Likert-type items: That is the question. Psicothema, 32(1), 108-114. https://doi.org/10.7334/psicothema2019.286

Wainer, H. (1993). Measurement problems. Journal of Educational Measurement, 30(1), 1-21.

https://doi.org/10.1111/j.1745-3984.1993.tb00419.x

Wainer, H. (1976). Estimating coefficients in linear models: It don’t make no nevermind. Psychological Bulletin, 83(2), 213-217. https://doi.org/10.1037//0033-2909.83.2.213

Downloads

Download data is not yet available.