How to validate the model performance of the `rmweather' pac

Regressionsmodelle aller Art mit R.

How to validate the model performance of the `rmweather' pac

Beitragvon mo_scht » Fr 5. Feb 2021, 16:44

i would like to study the effects of the COVID-19-virus containment measures on air quality (NO2,PM10) for my master thesis. I have meteorological data for 4 years (2017-2020). The dataset "data_prepared" was spilt in 80 % training and 20% test before the training. I have trained the model with data from 2017-2019 and would like to perform a weather normalization for 2020.

First I used the function (R-Package: 'rmweather'):

Code: Alles auswählen
RF_pm2.5_model <- "rmw_do_all(
      data_prepared,
      variables = c("date_unix", "day_julian", "weekday", "hour", "temp", "RH", "wd", "ws", "pressure","u.","L","MLH"),
      variables_sample=c("hour", "temp", "RH", "wd", "ws", "pressure","u.","L","MLH"),
      n_trees = 500,
      n_samples = 500,
      verbose = TRUE)"


I used 500 decisiontrees and the meteorological values as variables_sample. The function "rmw_do all" first train a random forest model to predict pollutant concentrations using meteorological and time variables and then
immediately normalise a variable for "average" meteorological conditions. Because this functions also normalized the time values, I add a few steps to only normalised the meteorological values.

Even if I understand how to test if the model has suffered from overfitting, I'm unsure how to validate the model. I have read something about crossvalidation with the training and testset. if I understand this correctly, I have to check if the testresults are simular to the training results.

The option I used to check if the model has suffered from overfitting:

Code: Alles auswählen
testing_model <- rmw_predict_the_test_set(model = RF_pm2.5_model$model,
                                          df = RF_pm2.5_model$observations)

model_performance<-modStats(testing_model, mod = "value", obs = "value_predict",
                            statistic = c("n", "FAC2","MB", "MGE", "NMB", "NMGE", "RMSE","COE", "IOA", "r"), type = "default", rank.name = NULL)


But i am not sure, how to compare it to the performance for the training dataset. I have been reading a lot of papers, but at most it was only briefly written that the model was checked using the test dataset.
mo_scht
 
Beiträge: 1
Registriert: Fr 5. Feb 2021, 16:29
Danke gegeben: 0
Danke bekommen: 0 mal in 0 Post

Zurück zu Regressionsmodelle

Wer ist online?

Mitglieder in diesem Forum: 0 Mitglieder und 0 Gäste

cron