AN APPLICATION OF HOT DECK IMPUTATION AND SUBSTITUTION METHODS IN THE ESTIMATION OF MISSING DATA

It is of great importance to obtain data in an accurate and incomplete way for adequate conclusions to be drawn from investigations conducted. Due to various reasons, certain parts of an investigation might not be observed, and as a result of this, data might be missing and obtained incompletely. Missing value may not only be based on single variable but also a multitude of variables. In this study, missing data in different proportions and belonging to more than a variable were produced. When data were considered within a context which is missing completely at random, Hot Deck imputation, random Hot Deck imputation and substitution methods (mean, median) were compared in the estimation of missing value. As a result of analysis, Hot Deck imputation method was found to be more effective in the estimation of missing value.Key Words: Missing data, Hot Deck imputation, Substitution methods.

___

  • Alan, O., Shaw, C., Lisa, H., “The comparative efficacy of imputation methods for missing data in structural equation modeling”, European Journal of Operational Research, 151:53–79 (2003).
  • Bal, C., Özdamar, K., “Solving The Missing Value Problem By Use Of Simulated Data Sets”, Osmangazi Üniversitesi Tıp Fakültesi Dergisi, 26(2):67-76 (2004).
  • Jerez, J.M., Molina, I., Subirats, J.L., Franco, L., “Missing Data Imputation In Breast Cancer Prognosis”, Processing of the 24th IASTED International Multi-Conference Boimedical Engineerring, February 15-17, Innsbruck, Austria (2005)
  • Joseph, L., Schafer, J., Graham, W., “Missing Data: Our View of the State of the Art”, Psychological Methods, 7: 147-177(2002).
  • Juned, S., Thomas R.B., “Multiple imputation using an iterative hot-deck with distance-based donor selection”, Statistics In Medicine, 27: 83-102 (2002).
  • Kalton, G., Kish, L., “Some efficient random imputation methods”, Commun. Statist.-Theor. Meth., 13(16): 1919–1939 (1984).
  • Mohamed, S., Marwala, T., “Neural Network Based Techniques for Estimating Missing Data in Databases”, 16th Annual Symposium of the Patten Recognition Association of South Africa, Langebaan, 27-32 (2005).
  • Pelckmans, K., Brabanter, J.D., Suykens, J.A.K., Moor, B.D., “Handling missing values in support vector machine classifiers”, Neural Networks, 18: 684-692 (2005).
  • Stefano, M.I., Giuseppe, P., “Missing data imputation, matching and other applications of random recursive partitioning”, Computational Statistics & Data Analysis, 52: 773- 789 (2007).
  • Wayne A.F., Jae, K.K., “Hot Deck Imputation for the response model”, Proceedings of Statistics Canada Symposium (2001).
  • Wayne, A.F., Jae, K.K., “Hot deck imputation for the response model”, Statistics Canada, 31(2):139– 149 (2005).
  • Yenduri, S., “Performance Evaluation of Imputation Methods for Incomplete Data Sets”, International Journal of Software Engineering and Knowledge Engineering, 17:1-26 (2007). APPENDIX
  • Developed software program using C# programming language.