Predictive Analysis Using Web Scraping for the Real Estate Market in Gaziantep

Predictive Analysis Using Web Scraping for the Real Estate Market in Gaziantep

For investors and people who want to own a property, real estate is a crucial industry. Real estate includes land and any enduring construction, whether natural or artificial, such as houses, residences, apartments, and commercial structures. In Turkey, it is common to believe that owning property makes you live comfortably. Therefore, house ownership is a common aspiration among Turkish families. However, a variety of factors, such as a country's economic structure, inflation, world events, politics, etc., have an impact on the real estate market. In addition, the location, neighborhood, size, and number of rooms of a house can all affect how much it costs to live there. Gaziantep city is considered for analysis in the proposed study. The goal of this study is to predict which neighborhood, given a prospective buyer's financial status and specific property attributes, someone can afford to live in. As a result, web scraping is used to collect real estate data from the website. Once the data has been gathered, forecasting the neighborhood of a house is done using machine learning algorithms including decision trees, random forest, and extra trees. The results demonstrate that all algorithms produce good results with a performance accuracy of over 80%. However, among these algorithms, decision tree classification offers the best performance.

___

  • [1] J. Ratcliffe, M. Stubbs;, and M. Keeping, Urban Planning and Real Estate Development. Routledge, 2021.
  • [2] J. S. Chou, D. B. Fleshman, and D. N. Truong, Comparison of machine learning models to provide preliminary forecasts of real estate prices, no. 0123456789. Springer Netherlands, 2022.
  • [3] A. S. Ravikumar, Real Estate Price Prediction Using Machine Learning. 2017.
  • [4] D. Kulikauskas, “The user cost of housing in the Baltic states,” J. Eur. Real Estate Res., vol. 10, no. 1, pp. 17–34, 2017, doi: 10.1108/JERER-11-2015-0042.
  • [5] G. J. Rangel, J. W. J. Ng, T. T. Murugasu, and W. C. Poon, “Measuring Malaysian housing affordability: the lifetime income approach,” Int. J. Hous. Mark. Anal., vol. 12, no. 5, pp. 966–984, 2019, doi: 10.1108/IJHMA-02-2019-0023.
  • [6] L. Hu, S. He, and S. Su, “A novel approach to examining urban housing market segmentation: Comparing the dynamics between sales submarkets and rental submarkets,” Comput. Environ. Urban Syst., vol. 94, no. January, p. 101775, 2022, doi: 10.1016/j.compenvurbsys.2022.101775.
  • [7] F. Xue and E. Yao, “Adopting a random forest approach to model household residential relocation behavior,” Cities, vol. 125, no. May 2021, p. 103625, 2022, doi: 10.1016/j.cities.2022.103625.
  • [8] V. Sevinç, “Determining the Flat Sales Prices by Flat Characteristics Using Bayesian Network Models,” Comput. Econ., vol. 59, no. 2, pp. 549–577, 2022, doi: 10.1007/s10614-021-10099-5.
  • [9] A. Louati, R. Lahyani, A. Aldaej, A. Aldumaykhi, and S. Otai, “Price forecasting for real estate using machine learning: A case study on Riyadh city,” Concurr. Comput. Pract. Exp., vol. 34, no. 6, pp. 1–16, 2022, doi: 10.1002/cpe.6748.
  • [10] C. H. Yang, B. Lee, and Y. Da Lin, “Effect of Money Supply, Population, and Rent on Real Estate: A Clustering Analysis in Taiwan,” Mathematics, vol. 10, no. 7, pp. 1–17, 2022, doi: 10.3390/math10071155.
  • [11] J. Kim, J. Won, H. Kim, and J. Heo, “Machine-learning-based prediction of land prices in Seoul, South Korea,” Sustain., vol. 13, no. 23, pp. 1–14, 2021, doi: 10.3390/su132313088.
  • [12] T. G. D. Souza, F. D. R. Fonseca, V. D. O. Fernandes, and J. C. Pedrassoli, “Exploratory spatial analysis of housing prices obtained from web scraping technique,” Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. - ISPRS Arch., vol. 43, no. B4-2021, pp. 135–140, 2021, doi: 10.5194/isprs-archives-XLIII-B4-2021-135-2021.
  • [13] H. Salem and M. Mazzara, “ML-based Telegram bot for real estate price prediction,” J. Phys. Conf. Ser., vol. 1694, no. 1, 2020, doi: 10.1088/1742-6596/1694/1/012010.
  • [14] A. Grybauskas, V. Pilinkienė, and A. Stundžienė, “Predictive analytics using Big Data for the real estate market during the COVID-19 pandemic,” J. Big Data, vol. 8, no. 1, 2021, doi: 10.1186/s40537-021-00476-0.
  • [15] H. Ahmed, T. A. Jilani, W. Haider, S. N. Hasany, M. A. Abbasi, and A. Masroor, “Producing standard rules for smart real estate property buying decisions based on web scraping technology and machine learning techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 3, pp. 498–505, 2020, doi: 10.14569/ijacsa.2020.0110363.
  • [16] V. S. Katti and S. H. N, “Patents and Publications Web Scraping,” IJCSN Int. J. Comput. Sci. Netw., vol. 5, no. 2, pp. 2277–5420, 2016, [Online]. Available: www.IJCSN.org.
  • [17] L. Rokach and O. Maimon, “DECISION TREES,” in DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, no. January, 2005, pp. 165–192.
  • [18] M. A. Jun and J. C. P. Cheng, “Selection of target LEED credits based on project information and climatic factors using data mining techniques,” Adv. Eng. Informatics, vol. 32, pp. 224–236, 2017, doi: 10.1016/j.aei.2017.03.004.
  • [19] S. Shalev-Shwartz and S. Ben-David, Understanding machine learning: From theory to algorithms. New York: Cambridge University, 2013.
  • [20] G. Khanvilkar and D. Vora, “Product Recommendation using Sentiment Analysis of Reviews : A Random Forest Approach,” Int. J. Eng. Adv. Technol., no. January, 2019.
  • [21] S. Galelli and A. Castelletti, “Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling,” Hydrol. Earth Syst. Sci., vol. 17, no. 7, pp. 2669–2684, 2013, doi: 10.5194/hess-17-2669-2013.
  • [22] M. W. Ahmad, J. Reynolds, and Y. Rezgui, “Predictive modelling for solar thermal energy systems : A comparison of support vector regression , random forest , extra trees and regression trees,” J. Clean. Prod., vol. 203, pp. 810–821, 2018, doi: 10.1016/j.jclepro.2018.08.207.
  • [23] A. Onan, “Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling,” Comput. Math. Methods Med., vol. 2018, 2018, doi: 10.1155/2018/2497471.