Missing data in educational assessment: a comparison of data treatment methods

Authors

  • Luis Gustavo do Amaral Vinha Universidade de Brasília (UnB), Brasília, Distrito Federal, Brasil
  • Jacob Arie Laros Universidade de Brasília (UnB), Brasília, Distrito Federal, Brasil

DOI:

https://doi.org/10.18222/eae.v0ix.3916

Keywords:

Treatment of Missing Data, Education Assessment, Academic Performance, Simulation.

Abstract

Missing data are common in educational assessments. Using the appropriate methods has, therefore, become essential to reduce the impact of the loss of information. The present study aims to compare the performance of four methods for dealing with missing data (mean imputation, listwise deletion, maximum likelihood and multiple imputation), all based on regression models applied to the educational assessment of data collected in the State of Ceará. Information about 7,000 students was used, simulating various scenarios according to the percentage and the type of the missing data. The mean imputation method showed the worst performance in all simulated scenarios and the other methods showed similar results among themselves. Moreover, the use of auxiliary variables in the estimation by maximum likelihood and multiple imputation proved to reduce the bias of estimates of some important parameters of the model, when the simulated missing data is not random.

Downloads

Download data is not yet available.

Author Biographies

Luis Gustavo do Amaral Vinha, Universidade de Brasília (UnB), Brasília, Distrito Federal, Brasil

Doutor em Psicologia Social pela Universidade de Brasília e Mestre em Estatística pela Universidade de São Paulo

Professor Adjunto do Departamento de Estatística da Universidade de Brasília

Jacob Arie Laros, Universidade de Brasília (UnB), Brasília, Distrito Federal, Brasil

Doutor PhD em Psicologia pela University of Groningen

Professor Associado do Instituto de Psicologia da Universidade de Brasília.

References

BARALDI, Amanda N.; ENDERS, Craig K. An introduction to modern missing data analyses. Journal of School Psychology, Amsterdam, v. 48, p. 5-37, 2010. DOI: https://doi.org/10.1016/j.jsp.2009.10.001

BUCK, S. F. A method of estimation of missing values in multivariate data suitable for use with an electronic computer. Journal of the Royal Statistical Society, Series B, London, v. 22, n. 2, p. 302-306, 1960. DOI: https://doi.org/10.1111/j.2517-6161.1960.tb00375.x

CEARÁ. Secretaria da Educação. SPAECE – 2011. Matemática, 3º ano: ensino médio. Fortaleza: SEE, UFJF, 2011. p. 1-22. (Boletim Pedagógico, v. 3).

CHEEMA, Jehanzeb R. A review of missing data handling methods in education research. Review of Educational Research, Thousand Oaks, CA, v. 20, n. 10, p. 1-20, 2014.

COHEN, Jacob; COHEN, Patricia. Applied multiple regression and correlation analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum, 1985.

COLLINS, Linda M.; SCHAFER, Joseph L.; KAM, Chi-Ming. A comparison of inclusive and restrictive strategies in modern missing-data procedures. Psychological Methods, Washington, v. 6, n. 4, p. 330-351, 2001. DOI: https://doi.org/10.1037//1082-989X.6.4.330

COX, Bradley E. et al. Working with missing data in higher education research: a primer and real world. The Review of Higher Education, Baltimore, v. 37, n. 3, p. 377-402, Spring 2014. DOI: https://doi.org/10.1353/rhe.2014.0026

CRONINGER, Robert G.; DOUGLAS, Karen M. Missing data and institutional research. In: UMBACH, P. D. (Ed.). Survey research: emerging issues of technology, policy, and analysis. San Francisco: Wiley Interscience Periodicals, 2005. p. 33-49. DOI: https://doi.org/10.1002/ir.154

ENDERS, Craig K. The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, Thousand Oaks, CA, v. 61, n. 5, p. 713-740, 2001a. DOI: https://doi.org/10.1177/00131640121971482

ENDERS, Craig K. The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. Psychological Methods, Washington, v. 6, n. 4, p. 352-370, 2001b. ENDERS, Craig K. Applied missing data analysis. New York: Guilford, 2010. DOI: https://doi.org/10.1037//1082-989X.6.4.352

FITZMAURICE, Garret et al. Longitudinal data analysis. Boca Raton: Chapman & Hall, 2009. DOI: https://doi.org/10.1201/9781420011579.pt1

GRAHAM, John W. Missing data analysis: making it work in the real world. Annual Review of Psychology, Palo Alto, CA, v. 60, p. 549-576, 2009. DOI: https://doi.org/10.1146/annurev.psych.58.110405.085530

GRAHAM, John W.; OLCHOWSKI, Allison E.; GILREATH, Tamika D. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, Berlin, v. 8, p. 206-213, 2007. DOI: https://doi.org/10.1007/s11121-007-0070-9

LANGKAMP, Diane L.; LEHMAN, Amy; LEMESHOW, Stanley. Techniques for handling missing data in secondary analyses of large surveys. Academic Pediatrics, Amsterdam, v. 10, n. 3, p. 205-210, maio/jun. 2010.3031 DOI: https://doi.org/10.1016/j.acap.2010.01.005

MACEDO, Glaucia Alves. Fatores associados ao rendimento escolar de alunos da 5ª série (2000): uma abordagem longitudinal do valor adicionado e da heterogeneidade. 2004. 212f. Dissertação (Mestrado em Demografia) – Faculdade de Ciências Econômicas, Universidade Federal de Minas Gerais, Belo Horizonte, 2004.

MCKNIGHT, Patrick E. et al. Missing data: a gentle introduction. New York: Guilford Press, 2007.

OLIVEIRA, Pedro Rodrigues; BELLUZZO, Walter; PAZELLO, Elaine Toldo. The public–private test score gap in Brazil. Economics of Education Review, Amsterdam, v. 35, p. 120-133, 2013. DOI: https://doi.org/10.1016/j.econedurev.2013.04.003

PEUGH, James L.; ENDERS, Craig K. Missing data in educational research: a review of reporting practices and suggestions for improvement. Review of Educational Research, Thousand Oaks, CA, v. 74, n. 4, p. 525-556, Winter 2004. DOI: https://doi.org/10.3102/00346543074004525

RODRIGUES, Clarissa Guimarães; RIOS-NETO, Eduardo Luiz Gonçalves; PINTO, Cristine Campos de Xavier. Diferenças intertemporais na média e distribuição do desempenho escolar no Brasil: o papel do nível socioeconômico, 1997-2005. Revista Brasileira de Estudos de População, Belo Horizonte, v. 28, n. 1, p. 5-36, jan./jun. 2011. DOI: https://doi.org/10.1590/S0102-30982011000100002

ROSE, Roderick A.; FRASER, Mark W. A simplified framework for using multiple imputation in social work research. Social Work Research, Oxford, v. 32, n. 3, p. 171-178, 2008. DOI: https://doi.org/10.1093/swr/32.3.171

ROUSSEAU, Michel et al. Reporting missing data: a study of selected articles published from 2003-2007. Quality & Quantity, Berlin, v. 46, n. 5, p. 1393-1406, 2012. DOI: https://doi.org/10.1007/s11135-011-9452-y

RUBIN, Donald B. Inference and missing data. Biometrika, Oxford, v. 63, n. 3, p. 581-592, 1976. DOI: https://doi.org/10.1093/biomet/63.3.581

RUBIN, Donald B. Multiple imputation for nonresponse in surveys. New York: Wiley, 1987. DOI: https://doi.org/10.1002/9780470316696

SAVALEI, Victoria; BENTLER, Peter M. A two-stage approach to missing data: theory and application to auxiliary variables. Structural Equation Modeling, London, v. 16, n. 3, p. 477-497, 2009. DOI: https://doi.org/10.1080/10705510903008238

SCHAFER, Joseph L.; GRAHAM, John W. Missing data: our view of the state of the art. Psychological Methods, Washington, v. 7, n. 2, p. 147-177, 2002. DOI: https://doi.org/10.1037//1082-989X.7.2.147

SOARES, José Francisco; ALVES, Maria Teresa Gonzaga. Desigualdades raciais no sistema brasileiro de educação básica. Educação e Pesquisa, São Paulo, v. 29, n. 1, p. 147-165, jan./jun. 2003. DOI: https://doi.org/10.1590/S1517-97022003000100011

VINHA, Luís Gustavo do Amaral. Estudos longitudinais e tratamento de dados ausentes em avaliações educacionais. 2016. 124f. Tese (Doutorado em Psicologia Social, do Trabalho e das Organizações) – Instituto de Psicologia, Universidade de Brasília, Brasília, 2016.

WILKS, S. S. Moments and distributions of estimates of population parameters from fragmentary samples. The Annals of Mathematical Statistics, New York, v. 3, p. 163-195, 1932. 312 DOI: https://doi.org/10.1214/aoms/1177732885

XERXENEVSKY, Lauren Lewis. Programa Mais Educação: avaliação do impacto da educação integral no desempenho de alunos no Rio Grande do Sul. 2012. 143f. Dissertação (Mestrado em Economia do Desenvolvimento) – Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, 2012.

YOUNG, Rebekah; JOHNSON, David. Methods for handling missing secondary respondent data. Journal of Marriage and Family, New Jersey, v. 75, n. 1, p. 221-234, 2013. DOI: https://doi.org/10.1111/j.1741-3737.2012.01021.x

Published

2018-04-23

How to Cite

Vinha, L. G. do A., & Laros, J. A. (2018). Missing data in educational assessment: a comparison of data treatment methods. Estudos Em Avaliação Educacional, 29(70), 156–187. https://doi.org/10.18222/eae.v0ix.3916