What Now? Some Brief Reflections on Model-Free Data Analysis

What Now? Some Brief Reflections on Model-Free Data Analysis

David Freedman's critique of causal modeling in the social and biomedical sciences was fundamental. In his view, the enterprise was misguided, and there was no technical fix. Far too often, there was a disconnect between what the statistical methods required and the substantive information that could be brought to bear. In this paper, I briefly consider some alternatives to causal modeling assuming that David Freedman's perspective on modeling is correct. In addition to randomized experiments and strong quasi experiments, I discuss multivariate statistical analysis, exploratory data analysis, dynamic graphics, machine learning and knowledge discovery.

___

  • Anderson, T.W. (1958). An Introduction to Multivariate Statistical Analysis, First Edition. New York: John Wiley and Sons.
  • Berk, R.A. (2005). Randomized Experiments as the Bronze Standard. Journal of Experimental Criminology, 1(4): 417-433.
  • Berk, R.A. (2008). Statistical Learning from a Regression Perspective, New York: Springer.
  • Bishop, C.M. (2006). Pattern Recognition and Machine Learning. New York: Springer.
  • Brown, L.D., M.L. Eaton, D.A. Freedman, S.P. Klein, R.A. Olshen, K.W. Wachter, M.T. Wells and D. Ylvisaker (1999). Statistical Controversies in Census 2000. Jurimetrics, 39: 347-375.
  • Boullicaut, J-F., and C. Masson (2005). “Data Mining Query Languages”, (in: O. Maimon and L. Rokach -Ed. The Data Mining and Knowledge Discovery Handbook), New York: Springer.
  • Cleveland, W.E. (1993). Visualizing Data. Summit, Jew Sersey: Hobart Press.
  • Cook, D., Swayne, D.F., Buja, A., and T. Lang (2007). Interactive Dynamc Graphics and Data Analysis. New York: Springer.
  • Dasu, T., and T, Johnson (2003). Exploratory Data Mining and Data Cleaning. New York: John Wiley and Sons.
  • Diaconis, P. (1985). “Theories of Data Analysis: From Magical Thinking through Classical Statistics”. (in: D.C. Hoaglin, F. Mosteller, and J. Tukey -Ed., Exploring Data Tables, Trends, and Shapes), New York: John Wiley and Sons.
  • Freedman, D.A. (2004). On Specifying Graphical Models for Causation, and the Identification Problem. Evaluation Review, 26: 267-293.
  • Freedman, D.A. (2005). Statistical Models Theory and Practice. Cambrige University Press.
  • Freedman, D.A. (2006). Statistical Models for Causation: What Inferential Leverage Do They Provide? Evaluation Review, 30: 691-713.
  • Freedman, D.A. (2008a). Randomization Does not Justify Logistic Regression. Statistical Science, 23: 237-249.
  • Freedman, D.A. (2008b). On Regression Adjustments to Experimental Data. Advances in Applied Mathematics, 40: 180-193.
  • Freedman, D.A. and R.A. Berk (2008). On Weighting Regressions by Propensity Scores. Evaluation Review, 32: 392-409.
  • Freedman, D.A. and P. Humphreys (1999). Are There Algorithms That Discover Causal Structure? Synthese, 121: 29-54.
  • Freedman, D.A., P.B. Stark and K.W. Wachter (2001). A Probability Model for Census Adjustment. Mathematical Population Studies, 9: 165-180.
  • Freedman, D.A. and K.W. Wachter (2001). Census Adjustment: Statistical Promise or Statistical Illusion? Society, 39: 26-33.
  • Freedman, D.A. and K.W. Wachter (2003). “On The Likelihood of Improving the Accuracy of The Census Through Statistical Adjustment”. (in D. R. Goldstein -Ed., Science and Statistics: A Festschrift for Terry Speed. Institute of Mathematical Statistics Monograph 40: 197-230.)
  • Gifi, A. (1990). Nonlinear Multivariate Analysis, New York: John Wiley and Sons.
  • Hastie, T., R. Tibshirani and J. Friedman (2009). The Elements of Statistical Learning, Second Edition. New York: Springer.
  • Heckman, J. (2000). Causal Parameters and Policy Analysis in Economics: A Twentieth Century Retrospective. Quarterly Journal of Economics, 88 (2): 47-97.
  • Hoaglin, D.C., F. Mostellor and J. Tukey (2006), Understanding Robust and Exploratory Data Analysis, Wiley Classics Library. New York: John Wiley and Sons.
  • Hoaglin, D.C., F. Mostellor and J. Tukey (2000). Exploring Data Tables, Trends, and Shapes, Wiley Classics Library. New York: John Wiley and Sons.
  • Höppner, F. (2005). “Association Rules”, (in O. Maimon and L:. Rokach -Ed., The Data Mining and Knowledge Discovery Handbook), New York: Springer.
  • Imbens, G. (2004). Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. The Review of Economics and Statistics, 86 (1): 4-29.
  • Imbens, G. and T. Lemieux (2008). Regression Discontinuity Designs: A Guide to Practice. Journal of Econometrics, 142: 611-614.
  • King, R.D., J. Rowland, S.G. Oliver, M. Young, W. Aubrey, E. Byrne, M. Liakata, M. Markham, P. Pir, L.N. Soldatova, A. Sparkes, K.K. Whelan and A. Clare (2009). The Automation of Science. Science, 324 (3): 85-89.
  • Leeb, H. and B.M. Pötscher (2005). Model Selection and Inference: Facts and Fiction, Econometric Theory, 21: 21–59.
  • Leeb, H. and B.M. Pötscher (2006). Can one Estimate the Conditional Distribution of Post- Model-Selection Estimators? The Annals of Statistics, 34 (5): 2554–2591.
  • Leeb, H. and B.M. Pötscher (2008). “Model Selection”, (in T.G. Anderson, R.A. Davis, J.P. Kreib, and T. Mikosch -Ed., The Handbook of Financial Time Series), New York, Springer: 785–821.
  • Maimon, O., and L. Rokach (2005). The Data Mining and Knowledge Discovery Handbook. New York: Springer.
  • Rubin, D.B. (2008). For Objective Causal Inference, Design Trumps Analysis. Annals of Applied Statistics, 2 (1): 808-840.
  • Schmidt, M., and H. Lipson (2009). Distilling Free-Form Natural Laws from Exeprimental Data. Science, 324 (8): 81-85.
  • Tukey, J. (1977). Exploratory Data Analysis. Reading, MA: Addison -Wesley.
  • Wachter, K.W. and D.A. Freedman (2000). The Fifth Cell: Correlation Bias in U.S. Census Adjustment. Evaluation Review, 24: 191-211.
  • Wachter, K.W. and D.A. Freedman (2000). Measuring Local Heterogeneity with 1990 U.S. Census Data. Demographic Research, 3 (10): 1-22.
  • Wachter, K.W. and D.A. Freedman (1996). Planning for the Census in the Year 2000. Evaluation Review, 20: 355-377.