Generalised linear model-based algorithm for detection of outliers in environmental data and comparison with semi-parametric outlier detection methods

Outliers are often present in large datasets of air pollutant concentrations. Existing methods for detection of outliers in environmental data can be divided as follows into three groups depending on the character of the data: methods for time series, methods for time series measured simultaneously with accompanying variables and methods for spatial data. A number of methods suggested for the automatic detection of outliers in time series data are limited by assumptions of known distribution of the analysed variable. Since the environmental variables are often influenced by accompanying factors their distribution is difficult to estimate. Considering the known information about accompanying variables and using appropriate methods for detection of outliers in time series measured simultaneously with accompanying variables can be a significant improvement in outlier detection approaches. This paper presents a method for the automatic detection of outliers in PM10 aerosols measured simultaneously with accompanying variables. The method is based on generalised linear model and subsequent analysis of the residuals. The method makes use of the benefits from the additional information included in the accessibility of accompanying variables. The results of the suggested procedure are compared with the results obtained using two distribution-free outlier detection methods for time series formerly suggested by the authors. The simulations-based comparison of the performance of all three procedures showed that the procedure presented in this paper effectively detects outliers that deviate at least 5 standard deviations from the mean value of the neighbouring observations and outperforms both distribution-free outlier detection methods for time series.

Kaynakça

Abrutzky, R., Dawidowski, L., Matus, P., Lankao, P., 2012. Health effects of climate and air pollution in buenos aires: a first time series analysis. J. Environ. Protect. 3, 262–271.

Agresti, A., 2002. Categorical Data Analysis. University of Florida.

Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19 (6), 716–723.

Araki, S., Shimadera, H., Yamamoto, K., Kondo, A., 2017. Effect of spatial outliers on the regression modelling of air pollutant concentrations: a case study in Japan. Atmos. Environ. 153, 83–93.

Baffi, G., Martin, E.B., Morris, A.J., 1999. Non-linear projection to latent structures revisited (the neural network PLS algorithm). Comput. Chem. Eng. 23, 1293–1307.

Bao, X., Dai, L., 2009. Partial least squares with outlier detection in spectral analysis: a tool to predict gasoline properties. Fuel 88, 1216–1222.

Barnett, V., 2004. Environmental Statistics: Methods Nd Applications, first ed. Wiley Series in Probability and Statistics, Wiley.

Barnett, V., Lewis, T., 1978. Outliers in Statistical Data. John Wiley, Chichester, New York.

Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J., de Waal, D., Ferro, C., 2004. Statistics of Extremes: Theory and Applications, first ed. Wiley, New York.

Ben-Gal, 2005. Outlier detection. In: Mainon, O., Rokach, L. (Eds.), Data Mining and Knowledge Discovery Handbook, second ed. Springer, Boston, MA, pp. 117–130.

Bobia, M., Misiti, M., Misiti, Y., Poggi, J.-M., Portier, B., 2015. Spatial outlier detection in the PM10 monitoring network of Normandy (France). Atmo. Pollut. Res. 6, 476–483.

Brněnské komunikace, a.s, 2010. Ročenka Dopravy Brno 2009. Brno Municipality. Available online at: https://www.bkom.cz/informacni-centrum/rocenky-dopravybrno-15/rocenka-dopravy-brno-2009-pdf-55 (in Czech).

Brněnské komunikace, a.s, 2011. Ročenka Dopravy Brno 2010. Brno Municipality. Available online at: https://www.bkom.cz/informacni-centrum/rocenky-dopravybrno-15/rocenka-dopravy-brno-2010-pdf-54 (in Czech).

Brněnské komunikace, a.s, 2012. Ročenka Dopravy Brno 2011. Brno Municipality. Available online at: https://www.bkom.cz/informacni-centrum/rocenky-dopravybrno-15/rocenka-dopravy-brno-2011-pdf-53 (in Czech).

Brněnské komunikace, a.s, 2013. Ročenka Dopravy Brno 2012. Brno Municipality. Available online at: https://www.bkom.cz/informacni-centrum/rocenky-dopravybrno-15/rocenka-dopravy-brno-2012-pdf-52 (in Czech).

Brněnské komunikace, a.s, 2014. Ročenka Dopravy Brno 2013. Brno Municipality. Available online at: https://www.bkom.cz/uploads/informacni_centrum/11/ rocenka_dopravy_2013.pdf (in Czech).

Brněnské komunikace, a.s, 2015. Ročenka Dopravy Brno 2014. Brno Municipality. Available online at: https://www.bkom.cz/uploads/informacni_centrum/12/ rocenka_dopravy_2014.pdf (in Czech).

Brněnské komunikace, a.s, 2016. Ročenka Dopravy Brno 2015. Brno Municipality. Available online at: https://www.bkom.cz/uploads/informacni_centrum/101/ rocenka_dopravy_2015.pdf (in Czech).

Burman, J., Otto, M., 1988. Outliers in Time Series. Statistical Research Division Report Series CENSUS/SRD/RR-88/14. Bureau of the Census.

Čampulová, M., 2018. Comparison of methods for smoothing environmental data with an application to particulate matter PM10. Acta Univ. Agric. Silvic. Mendelianae Brunensis 66 (2), 453–463.

Čampulová, M., Veselík, P., Michálek, J., 2017. Control chart and six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM10. Atmos. Pollut. Res. 8 (4), 700–708.

Čampulová, M., Issever Grochová, L., Michálek, J., 2018a. Outlier detection in PM10 aerosols by generalised linear model. In: Proceedings of the International Conference of Numerical Analysis and Applied Mathematics, September 2017. American Institute of Physics (AIP), Melville.

Čampulová, M., Michálek, J., Mikuška, P., Bokal, D., 2018b. Algorithm for identification of outliers in environmental data. J. Chemometr. 32 (5), 1–17.

Chaloulakou, A., Kassomenos, P., Spzrellis, N., Demokritou, P., Koutrakis, P., 2003. Measurements of PM10 and PM2.5 particle concentrations in athens, Greece. Atmos. Environ. 37 (5), 649–660.

Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly detection: a survey. ACM Comput. Surv. 41 (3), 58 Article 15.

EEA (European Environment Agency), 2017. Air Quality in Europe. EEA Report No 13/ 2017.

EEA (European Environment Agency), 2018. Air Quality in Europe. EEA Report No 12/ 2018.

EU, 2008. Directive 2008/50/ec of the European Parliament and of the Council of 21 may 2008 on ambient air quality and cleaner air for Europe. Off. J. Eur. Commun. L 152, 1–44.

Fawcett, L., Walshaw, D., 2016. Sea-surge and wind speed extremes: optimal estimation strategies for planners and engineers. Stoch. Environ. Res. Risk Assess. 30, 463–480.

Filzmoser, P., 2005. Identification of multivariate outliers: a performance study. Aust. J. Stat. 34, 127–138.

Fox, A., 1972. Outliers in time series. J. Roy. Stat. Soc. Ser. B 34 (3), 350–363.

Garces, H., Sbarbaro, D., 2011. Outliers detection in environmental monitoring databases. Eng. Appl. Artif. Intell. 24, 341–349.

Gomes, M., 1993. On the estimation of parameter of rare events in environmental time series. In: Stat.Environ., Vol. 2 of Water Related Issues. Wiley, pp. 225–241.

Gupta, M., Gao, J., Aggarwal, C., 2014. Outlier detection for temporal data: a survey. IEEE T. Knowl. Data Eng. 26 (9), 2250–2267.

Hartigan, J., Wong, M., 1979. A k-means clustering algorithm. Appl. Stat. 28 00–108.

Holešovský, J., Čampulová, M., Michálek, J., 2018. Semiparametric outlier detection in nonstationary times series: case study for atmospheric pollution in Brno, Czech Republic. Atmos. Pollut. Res. 9 (1), 27–36.

Hormann, S., Pfeiler, B., Stadlober, E., 2005. Analysis and prediction of particulate matter PM10 for the winter season in Graz. Aust. J. Stat. 34 (4), 307–326.

Hrdličková, Z., Michálek, J., Kolář, M., Veselý, V., 2008. Identification of factors affecting air pollution by dust aerosol PM10 in Brno City, Czech Republic. Atmos. Pollut. Res. 42 (37), 8661–8673.

Hübnerová, Z., Michálek, J., 2014. Analysis of daily average PM10 predictions by generalized linear models in Brno, Czech Republic. Atmos. Pollut. Res. 5 (3), 471–476.

Iglewicz, B., Hoaglin, D., 1993. The ASQC basic references in quality control: statistical techniques. In: In: Mykytka, E.F. (Ed.), How to Detect and Handle Outliers, vol. 16 ASQC Quality Press, Milwaukee.

Johnson, R., Wichern, D., 1992. Appl. Multivar. Stat. an, third ed. Prentice-Hall, New Jersey.

Kim, K.-H., Kabir, E., Kabir, S., 2015. A review on the human health impact of airborne particulate matter. Environ. Int. 74, 136–143.

Křůmal, K., Mikuška, P., Večeřa, Z., 2017. Characterization of organic compounds in winter PM1 aerosols in a small industrial town. Atmos. Pollut. Res. 8 (5), 930–939.

Lourenço, V.M., Pires, A.M., 2014. M-regression, false discovery rates and outlier detection with application to genetic association studies. Comput. Stat. Data Anal. 78, 33–42.

McCullagh, P., Nelder, J., 1989. Generalised Linear Models, second ed. Chapman and Hall, New York.

McLachlan, G.J., Krishnan, T., 2008. The EM Algorithm and Extensions, second ed. Wiley, New York.

Mikuška, P., Kubátková, N., Křůmal, K., Večeřa, Z., 2017. Seasonal variability of monosaccharide anhydrides, resin acids,methoxyphenols and saccharides in PM2.5 in Brno, the Czech Republic. Atmos. Pollut. Res. 8 (3), 576–586.

Miller, L., Lemke, L.D., Xu, X., et al., 2010. Intra-urban correlation and spatial variability of air toxics across an international airshed in Detroit, Michigan (USA) and Windsor, Ontario (Canada). Atmos. Environ. 44 (9), 1162–1174.

O'Leary, B., Lemke, L.D., 2014. Modeling spatiotemporal variability of intra-urban air pollutants in Detroit: a pragmatic approach. Atmos. Environ. 94, 417–427.

O'Leary, B., Reiners Jr., J.J., Xu, X., Lemke, L.D., 2016. Identification and influence of spatio-temporal outliers in urban air quality measurements. Sci. Total Environ. 573, 55–65.

Pope, C., Dockery, D., 2006. Health effects of fine particulate air pollution: lines that connect. J. Air Waste Manag. Assoc. 56, 709–742.

Pope, C., Dockery, D., Schwartz, J., 1995. Review of epidemiological evidence of health effects of particulate air pollution. Inhal. Toxicol. 7, 1–18.

Rahman, S.M.A.K., Sathik, M.M., Kannan, K.S., 2012. Multiple linear regression models in outlier detection. Int. J. Res. Comput. Sci. 2 (2), 23–28.

Restrepo, C., Simonoff, J., Thurston, G., Zimmerman, R., 2012. Asthma hospital admissions and ambient air pollutant concentrations in New York city. J. Environ. Protect. 3, 1102–1116.

Rice, K., Spiegelhalter, D., 2006. A simple diagnostic plot connecting robust estimation, outlier detection, and false discovery rates. J. Appl. Stat. 33 (10), 1131–1147.

Ripley, B.D., Venables, W.N., 1999. Modern Applied Statistics with S-PLUS, Statistics and Computing, third ed. Springer, New York.

Shaadan, N., Jemain, A., Latif, M., Deni, S., 2015. Anomaly detection and assessment of PM10 functional data at several locations in the klang valley, Malaysia. Atmos. Pollut. Res. 6, 365–375.

She, Y., Owen, A.B., 2011. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 106, 626–639.

Silva, A.T., Naghettini, M., Portela, M.M., 2016. On some aspects of peaks over-threshold modeling of oods under nonstationarity using climate covariates. Stoch. Environ. Res. Risk Assess. 30, 207–224.

Stadlober, E., Hörmann, S., Pfeiler, B., 2008. Quality and performance of a PM10 daily forecasting model. Atmos. Environ. 42, 1098–1109.

Stadlober, E., Hüberová, Z., Michálek, J., 2012. Prediction and forecast of daily PM10 concentrations in Brno and Graz by different regression approaches. Aust. J. Stat. 41 (4), 287–310.

WHO, 2005. Air Quality Guidelines for Particulate Matter, Ozone, Nitrogen Dioxide and Sulfur Dioxide, Global Update 2005. World Health Institution Available online at: http://www.euro.who.int/_gerbol#03A9data/assets/pdf_file/0005/. M

Kaynak Göster

Atmospheric Pollution Research
  • ISSN: 1309-1042
  • Yayın Aralığı: Yılda 12 Sayı
  • Başlangıç: 2010

10.3b6.7b

Sayıdaki Diğer Makaleler

Particulate matter size distribution in air surface layer of Middle Ural and Arctic territories

E. M. BAGLAEVA, A. P. SERGEEV, A. G. BUEVİCH, I. E. SUBBOTİNA, A. V. SHİCHKİN

Levels, temporal/spatial variations and sources of PAHs and PCBs in soil of a highly industrialized area

Sema YURDAKUL, Işıl ÇELİK COŞKUN, Meltem ÇELEN, Fatma Öztürk DÖNMEZ, Banu ÇETİN

Chemical speciation of water-soluble ionic components in PM2.5 derived from peatland fires in Sumatra Island

Yusuke FUJİİ, Haryono Setiyo HUBOYO, Susumu TOHNO, Tomoaki OKUDA, Syafrudin

Performance assessment of CHIMERE and EURAD-IM’ dust modules

C. GAMA, I. RİBEİRO, A. C. LANGE, A. Olrik VOGEL, A. ASCENSO, V. SEİXAS, H. ELBERN, C. Borrego, E. FRİESE, A. MONTEİRO

Analysis and visualization of multidimensional time series: Particulate matter (PM10) from São Carlos-SP (Brazil)

Eduardo Carlos ALEXANDRİNA, Evandro S. ORTİGASSO, Elaine Schornobay LUİ, Jose Antonio SİLVEİRA GONÇALVES, Nivaldo Aparecido CORREA, Luis Gustavo NONATO, Monica Lopes AGUİAR

Modeling of VOCs and criteria pollutants from multiple natural gas well pads in close proximity, for different terrain conditions: A Barnett Shale case study

Farzaneh KHALAJ, Melanie SATTLER

Enhancing source identification of hourly PM2.5 data in Seoul based on a dataset segmentation scheme by positive matrix factorization (PMF)

Min-Bin PARK, Tae-Jung LEE, Eun-Sun LEE, Dong-Sool KİM

Two-step-hybrid model based on data preprocessing and intelligent optimization algorithms (CS and GWO) for NO2 and SO2 forecasting

Suling ZHU, Xuanlin QİU, Yanru YİN, Min FANG, Xingrong LİU, Xuejing ZHAO, Yanjun SHİ

Concentration, exchange and source identification of polycyclic aromatic hydrocarbons in soil, air and tree bark from the Middle-Lower Yangtze Plain, China

Xiaoguo WU, Hanyang LİU, Zijiao YUAN, Siquan WANG, Afeng CHEN, Binbin HE

Assessment of cyclists’ exposure to ultrafine particles along alternative commuting routes in Edinburgh

Javier LUENGO-OROZ, Stefan REİS