카테고리

열기 닫기

Thesis

검색

[Ph.D] [Ph. D] 이영수 (2023.2) Source apportionment and spatiotemporal analysis of PM2.5 using machine learning and receptor models

WML 조회수:650: 2023-03-10 11:40:26

http://waste.snu.ac.kr/bbs/board2_3/291891 URL COPY

Particulate matter less than 2.5 micrometers (PM2.5) has been a pollutant of interest globally for more than decades, owing to its adverse health effects. For developing effective PM2.5 management strategies, it is crucial to identify their sources and quantify how much they contribute to ambient PM2.5 concnetrations in time and space. Source apportionment is the key to identifying the characteristics of PM2.5. Receptor modeling is widely used to identify PM2.5 sources as a statistical method of source apportionment. The chemical constituents of PM2.5 were used as input data for receptor modeling.

Therefore, this study aimed to investigate the characteristics of PM2.5 using models of source apportionment and spatiotemporal analysis for effective management strategies. Two types of modeling were performed for the source apportionment study. The first is positive matrix factorization modeling, which identifies a specific source type and its contributions to PM2.5 from one site. The second is Bayseian spatial multicariate recpetor modeling, which derives major sources and their contributions to PM2.5 from multiple monitoring sites. In addition, mahcine learning models were used to predict the concentrations of PM2.5, which are important data for receptor modeling. Machine learning models that can be used to increase data integrity and applicability to PM2.5 data were assessed.

The sources of PM2.5 and their contribution in Siheung, South Korea, were identified using positive matrix factorization modeling. These 10 sources were secondary nitrate (24.3 %), secondary sulfate (18.8 %), traffic (18.8 %), combustion for heating (12.6%), biomass burning (11.8 %), coal combustion (3.6 %), heavy oil industry (1.8 %), smelting industry (4.0 %), sea salt (2.7 %), and soil (1.7 %). Based on derived sources, the carcinogenic and non-carcinogenic health risks due to PM2.5 inhalation were estimated. The contribution to PM2.5 mass concentration was low for coal combustion, heavy oil industry, and traffic sources but exceeded the benchmark carcinogenic health risk value (1E-06). Therefore, countermeasures on PM2.5 emission sources should be performed based on the PM2.5 mass concentration and health risks.

The feature extraction capabilities of the four mahcine learning model s to predict the chemical constituents of PM2.5 were assessed by comparing the prediction accuracy depending on input cariables, target constituetns for prediction, available period, missing ratios of input data, and study sites. The concentration of PM2.5 constiuents were predicted at three sites (Seoul, Ulsan, and Baengyeong) in South Korea between 2016 and 2018, using four mahcine learning models: generative adversarial imputation network (GAIN), fully connected deep neural network (FCDNN), random forest (RF), and k-neartest neigbor(kNN). The prediction accuracy identified by the coefficient of determination (R^2) between the prediction and observation was highest in GAIN, followed by FCDNN, RF, and KNN. As the missing ratios (20, 40, 60, and 80 %) of the input data increased, the prediction accuracy decreased in the four models and was more noticeable in GAIN and kNN< which are unsupervised models. As the input data period increased, the two depp learning models, GAIN and DNN, had better applicability than the other models, RF and kNN. The study sites with more emission sources exhibited lower prediction accuracy, resulting in the highest R^2, respectively. This study demonstrated that machine learning models can be extended for further air pollution studies depending on model features, required performance, and experimental conditions, such as data availability and time constraints.

The spatial distribution of five PM2.5 sources in South Korea were estimated using Bayesian spatial multivariate recpetor modeling. Secondary nitrate secondary sulfate, motor vehicle emissions, industry and sea salts were determined to be significant contributors to ambient PM2.5 concnetration in South Korea.the spatial surface of the daily average contribution for each source in South Korea was derived from measurement data from the eight monitoring sites. The source contributions predicted by the BSMRM were also validated using held-out data from a test site (such as Ansan, Dajeon, and Gwangju). These predicted source contributions can aid in developing effective PM2.5 control strategies in cities where no speciated PM2.5 monitoring stations are available. They can also be utilized as source-specific exposures in heatlh effect studies, even in cities where no monitoring stations are available.

Keywords: PM2.5; Source apportionment; positive matrix factorization; Machine learning modeling; PM2.5 chemical constituents; Bayeisan receptor modeling

출력