Numerical Data Modelling and Classification in Marine Geology by the SPSS Statistics

The paper focuses on the geostatistical analysis of the data set on the Philippine archipelago. The research question is understanding variability in several geospatial parameters (geology, geomorphology, tectonics and bathymetry) in different segments of the study area. The initial data set was generated in QGIS by digitizing 25 cross-sectioning profiles. The data set  contained information on the geospatial parameters in the samples by profiles. Modelling and statistical analysis were performed in SPSS IBM Statistics software. The analysis of the topography shows strong variability of the elevations in the samples with the extreme depths in the central part of the study area (profile 13 with -9,400 m) and highest elevations in its south-western part (profile 17 with 1950 m). The analysis of the geological classes and lithology shows maximal samples of the basic volcanic rocks (40,40%) followed by mixed sedimentary consolidated rocks (31,90 %). Pairwise analysis of the sediment thickness and slope aspect demonstrates correlation between these two variables with the maximal sediment layer in the profiles 1-4 crossing the Philippines. The hierarchical dendrogram clustering of the bathymetry by three approaches shown maximal correlation of 5 clusters containing profile groups: 12-18 (centre), 22-25 (south-west), 1-2 (north), 7-8 (north-east), 19-21 (south-west). Other profiles show lesser similarities in the bathymetric patterns. The forecasting models were computed for the geospatial variables showing gradual increase in the gradient angles southwards and increased values for the sediment thickness in the north. Technically, the results proved effectiveness of the SPSS application of the geological data modelling.The paper focuses on the geostatistical analysis of the data set on the Philippine archipelago. The research question is understanding variability in several geospatial parameters (geology, geomorphology, tectonics and bathymetry) in different segments of the study area. The initial data set was generated in QGIS by digitizing 25 cross-sectioning profiles. The data set  contained information on the geospatial parameters in the samples by profiles. Modelling and statistical analysis were performed in SPSS IBM Statistics software. The analysis of the topography shows strong variability of the elevations in the samples with the extreme depths in the central part of the study area (profile 13 with -9,400 m) and highest elevations in its south-western part (profile 17 with 1950 m). The analysis of the geological classes and lithology shows maximal samples of the basic volcanic rocks (40,40%) followed by mixed sedimentary consolidated rocks (31,90 %). Pairwise analysis of the sediment thickness and slope aspect demonstrates correlation between these two variables with the maximal sediment layer in the profiles 1-4 crossing the Philippines. The hierarchical dendrogram clustering of the bathymetry by three approaches shown maximal correlation of 5 clusters containing profile groups: 12-18 (centre), 22-25 (south-west), 1-2 (north), 7-8 (north-east), 19-21 (south-west). Other profiles show lesser similarities in the bathymetric patterns. The forecasting models were computed for the geospatial variables showing gradual increase in the gradient angles southwards and increased values for the sediment thickness in the north. Technically, the results proved effectiveness of the SPSS application of the geological data modelling.

___

  • A. S. Sidhu, C.Y. Cho, J.A. Leong, R.K.J. Tan, “Large Scale Data Analytics”. Studies in Computational Intelligence Data, Semantics and Cloud Computing, vol. 806, pp. 89. Springer, Australia. doi: 10.1007/978-3-030-03892-2
  • H. Cuesta, and S. Kumar. 2016. Practical Data Analysis, 2nd Edition. A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark. pp. 360. ISBN-10: 1785289713. Packt Publishing Ltd. Livery Place, Birmingham, UK.
  • P. Lemenkova. “R scripting libraries for comparative analysis of the correlation methods to identify factors affecting Mariana Trench formation”. Journal of Marine Technolology and Environment, vol. 2, pp. 35-42, 2018. arXiv: 1812.01099, doi: 10.6084/m9.figshare.7434167
  • C.D. Manning, P. Raghavan, and H. Schuetze, An introduction to information retrieval. Cambridge: Cambridge University Press, 2009.
  • Y. Demchenko, P. Grosso, C. de Laat, P. Membrey, “Addressing big data issues in scientific data infrastructure,” 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, 2013, pp. 48–55.
  • J. Davis, Statistics and Data Analysis in Geology. Kansas Geological Survey John Wiley and Sons, 1990.
  • F. Politz, B. Kazimi, and M. Sester, “Classification of Laser Scanning Data Using Deep Learning”, vol. 38. Wissenschaftlich-Technische Jahrestagung der DGPF und PFGK18 Tagung in München – Publikationen der DGPF, Band 27, 2018.
  • C. S. Campbell, P. W. Cleary, and M. Hopkins, “Large-scale landslide simulations: Global deformation, velocities and basal friction”, Journal of Geophysical Research: Solid Earth, vol. 100(B5): pp. 8267–8283.
  • P. Lemenkova, “Processing Oceanographic Data by Python Libraries Numpy, SciPy And Pandas”, Aquatic Research, vol. 2(2), pp. 73-91, 2019, doi: 10.3153/AR19009
  • S. H., Cannon, and W. Z. Savage, “A mass-change model for the estimation of debris-flow runout”. The Journal of Geology, vol. 96(2), pp. 221–227, 1988.
  • P. Lemenkova, 2018. “Factor Analysis by R Programming to Assess Variability Among Environmental Determinants of the Mariana Trench”. Turkish Journal of Maritime and Marine Sciences, 4(2), pp. 146-155, doi: 10.6084/m9.figshare.7358207, 2018.
  • R Development Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, [Online] url: http://www.R-project.org/
  • D. Sarkar, Lattice: Multivariate data visualization with R. pp.25, New York: Springer, 2008.
  • P. Lemenkova, 2019. “An empirical study of R applications for data analysis in marine geology”. Marine Science and Technology Bulletin, vol. 8(1): pp. 1–9, 2019. doi: 10.33714/masteb.486678
  • G. van Rossum. Python Programming Language. 2011. [Online] url: https://www.python.org/
  • I. Idris Python Data Analysis Learn how to apply powerful data analysis techniques with popular open source Python modules. 348 pp. Packt Publishing. Birmingham, UK, 2014. ISBN 978-1-78355-335-8.
  • R. Johansson, Numerical Python. A Practical Techniques Approach for Industry. Urayasu, Chiba, Japan, 2015. doi: 10.1007/978-1-4842-0553-2
  • L. Ferranti, S. Passaro, and G. de Alteriis. “Morphotectonics of the Gorringe Bank summit, eastern Atlantic Ocean, based on high-resolution multibeam bathymetry”. Quaternary International, 332, 99-114, 2014. doi: 10.1016/j.quaint.2013.11.011
  • J.T. Vázquez, B. Alonso, M.C. Fernández-Puga, M. Gómez-Ballesteros, J. Iglesias, D. Palomino, C. Roque, G. Ercilla, and V. Díaz-del-Río. “Seamounts along the Iberian Continental Margins”. Boletín Geológico y Minero, vol. 126 (2-3), pp. 483-514, 2015.
  • C. Yesson, R. C. Malcolm, M. L. Taylor, A. D. Rogers. 2011. “The global distribution of seamounts based on 30 arc seconds bathymetry data”. Deep-Sea Research Part I: Oceanographic Research Papers, vol. 58, pp. 442-453. doi: 10.1016/j.dsr.2011.02.004
  • Jain, A.K., and Dubes, R.C., Algorithms for Clustering Data, Englewood Cliffs NJ: Prentice-Hall, 1988.
  • Meila, M., “Comparing clusterings – An information based distance”. Journal of Multivariate Analysis, vol. 98(5), pp. 873–895, 2007.
  • Kumaran, G., Allan, J., and McCallum, “A. Classification models for new event detection”, International conference on information and knowledge management (CIKM2004). ACM, 2004.
  • J.H. Ward, “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statistical Association, vol. 58, pp. 236–244, 1963.
  • P. Lemenkova. “Hierarchical Cluster Analysis by R language for Pattern Recognition in the Bathymetric Data Frame: a Case Study of the Mariana Trench, Pacific Ocean”, 5th International Conference Virtual Simulation, Prototyping and Industrial Design. Proceedings, Ed. M. N. Krasnyansky. Tambov, vol. 2 (5), pp. 147–152, Nov. 14–16, 2018., doi: 10.6084/m9.figshare.7531550
  • Murtagh, F. “Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?” Journal of Classification, vol. 31, pp. 274-295, 2014. doi: 10.1007/s00357-014-9161-z
  • Gauer, P., A. Elverhoi, D. Issler, and F. V. De Blasio. 2006. On numerical simulations of subaqueous slides: back-calculations of laboratory experiments of clay-rich slides. Norsk Geologisk Tidsskrift, vol. 86(3), pp. 295.
  • A. Cerioli, F. Torti, M. Riani. 2013. Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Eds: B. Lausen, D. V. d Poel, A. Ultsch. 547 pp. ISBN-10: 978-3-319-00034-3. Springer, doi: 10.1007/978-3-319-00035-0
  • N. Boylan, C. Gaudin, D.J. White, and M.F. Randolph. “Modelling of submarine slides in the geotechnical centrifuge”, 7th International Conference on Physical Modelling in Geotechnics (ICPMG), pp. 1095–1100. Zurich, Switzerland: ICPMG, 2010.
  • J.M. Chambers. Software for Data Analysis Programming with R. Springer, pp. 237-288, 2008. doi: 10.1007/978-0-387-75936-4