Использует ли sbf() метрический аргумент для оптимизации модели?
Переходя ROC
как metric
значение аргумента для caretSBF
Наша цель - использовать итоговую метрику ROC для выбора модели при выполнении выбора по фильтрации sbf()
функция для выбора функций.
набор данных был использован в качестве воспроизводимого примера из mlbench
пакет для запуска train()
а также sbf()
с metric = "Accuracy"
а также metric = "ROC"
Мы хотим убедиться, sbf()
принимает metric
аргумент в применении train()
а также rfe()
функции для оптимизации модели. С этой целью мы планировали использовать train()
функция с sbf()
функция. caretSBF$fit
функция делает вызов train()
, а также caretSBF
передается sbfControl
Из вывода, кажется, metric
Аргумент используется только для inner resampling
и не для sbf
часть, то есть для outer resampling
выходной metric
аргумент не был применен, как используется train()
а также rfe()
Как мы использовали caretSBF
который использует train()
Похоже, что metric
аргумент ограничен в области train()
и, следовательно, не передается sbf
Мы были бы признательны за уточнение sbf()
использования metric
аргумент для оптимизации модели, то есть для outer resampling
Вот наша работа на воспроизводимом примере, показывающая train()
использования metric
использование аргумента Accuracy
а также ROC
, но для sbf
мы не уверены.
## Loading required packages
## Loading `BreastCancer` Dataset from *mlbench* package
## Data cleaning for missing values
# Remove rows/observation with NA Values in any of the columns
BrC1 <- BreastCancer[complete.cases(BreastCancer),]
# Removing Class and Id Column and keeping just Numeric Predictors
Num_Pred <- BrC1[,2:10]
Определение итоговой функции fiveStats
fiveStats <- function(...) c(twoClassSummary(...),
Определение trControl
trCtrl <- trainControl(method="repeatedcv", number=10,
repeats=1, classProbs = TRUE, summaryFunction = fiveStats)
TRAIN + METRIC = "Точность"
TR_acc <- train(Num_Pred,BrC1$Class, method="rf",metric="Accuracy",
trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))
# Random Forest
# 683 samples
# 9 predictor
# 2 classes: 'benign', 'malignant'
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 615, 615, 614, 614, 614, 615, ...
# Resampling results across tuning parameters:
# mtry ROC Sens Spec Accuracy Kappa
# 2 0.9936532 0.9729798 0.9833333 0.9765772 0.9490311
# 3 0.9936544 0.9729293 0.9791667 0.9750853 0.9457534
# 4 0.9929957 0.9684343 0.9750000 0.9706948 0.9361373
# 5 0.9922907 0.9684343 0.9666667 0.9677536 0.9295782
# Accuracy was used to select the optimal model using the largest value.
# The final value used for the model was mtry = 2.
TR_roc <- train(Num_Pred,BrC1$Class, method="rf",metric="ROC",
trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))
# Random Forest
# 683 samples
# 9 predictor
# 2 classes: 'benign', 'malignant'
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 615, 615, 614, 614, 614, 615, ...
# Resampling results across tuning parameters:
# mtry ROC Sens Spec Accuracy Kappa
# 2 0.9936532 0.9729798 0.9833333 0.9765772 0.9490311
# 3 0.9936544 0.9729293 0.9791667 0.9750853 0.9457534
# 4 0.9929957 0.9684343 0.9750000 0.9706948 0.9361373
# 5 0.9922907 0.9684343 0.9666667 0.9677536 0.9295782
# ROC was used to select the optimal model using the largest value.
# The final value used for the model was mtry = 3.
Редактирование функции CaretSBF Summary
caretSBF$summary <- fiveStats
Определение sbfControl
sbfCtrl <- sbfControl(functions=caretSBF,
method="repeatedcv", number=10, repeats=1,
verbose=T, saveDetails = T)
SBF + METRIC = "Точность"
sbf_acc <- sbf(Num_Pred, BrC1$Class,
sbfControl = sbfCtrl,
trControl = trCtrl, method="rf", metric="Accuracy")
## sbf_acc
# Selection By Filter
# Outer resampling method: Cross-Validated (10 fold, repeated 1 times)
# Resampling performance:
# ROC Sens Spec Accuracy Kappa ROCSD SensSD SpecSD AccuracySD KappaSD
# 0.9931 0.973 0.9833 0.9766 0.949 0.006272 0.0231 0.02913 0.01226 0.02646
# Using the training set, 9 variables were selected:
# Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...
# During resampling, the top 5 selected variables (out of a possible 9):
# Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)
# On average, 9 variables were selected (min = 9, max = 9)
## Class of sbf_acc
# [1] "sbf"
## Names of elements of sbf_acc
# [1] "pred" "variables" "results" "fit" "optVariables"
# [6] "call" "control" "resample" "metrics" "times"
# [11] "resampledCM" "obsLevels" "dots"
## sbf_acc fit element*
# Random Forest
# 683 samples
# 9 predictor
# 2 classes: 'benign', 'malignant'
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 615, 614, 614, 615, 615, 615, ...
# Resampling results across tuning parameters:
# mtry ROC Sens Spec Accuracy Kappa
# 2 0.9933176 0.9706566 0.9833333 0.9751492 0.9460717
# 5 0.9920034 0.9662121 0.9791667 0.9707801 0.9363708
# 9 0.9914825 0.9684343 0.9708333 0.9693308 0.9327662
# Accuracy was used to select the optimal model using the largest value.
# The final value used for the model was mtry = 2.
## Elements of sbf_acc fit
# [1] "method" "modelInfo" "modelType" "results" "pred"
# [6] "bestTune" "call" "dots" "metric" "control"
# [11] "finalModel" "preProcess" "trainingData" "resample" "resampledCM"
# [16] "perfNames" "maximize" "yLimits" "times" "levels"
## sbf_acc fit final Model
# Call:
# randomForest(x = x, y = y, mtry = param$mtry)
# Type of random forest: classification
# Number of trees: 500
# No. of variables tried at each split: 2
# OOB estimate of error rate: 2.34%
# Confusion matrix:
# benign malignant class.error
# benign 431 13 0.02927928
# malignant 3 236 0.01255230
## sbf_acc metric
# [1] "Accuracy"
## sbf_acc fit best Tune*
# mtry
# 1 2
sbf_roc <- sbf(Num_Pred, BrC1$Class,
sbfControl = sbfCtrl,
trControl = trCtrl, method="rf", metric="ROC")
## sbf_roc
# Selection By Filter
# Outer resampling method: Cross-Validated (10 fold, repeated 1 times)
# Resampling performance:
# ROC Sens Spec Accuracy Kappa ROCSD SensSD SpecSD AccuracySD KappaSD
# 0.9931 0.973 0.9833 0.9766 0.949 0.006272 0.0231 0.02913 0.01226 0.02646
# Using the training set, 9 variables were selected:
# Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...
# During resampling, the top 5 selected variables (out of a possible 9):
# Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)
# On average, 9 variables were selected (min = 9, max = 9)
## Class of sbf_roc
# [1] "sbf"
## Names of elements of sbf_roc
# [1] "pred" "variables" "results" "fit" "optVariables"
# [6] "call" "control" "resample" "metrics" "times"
# [11] "resampledCM" "obsLevels" "dots"
## sbf_roc fit element*
# Random Forest
# 683 samples
# 9 predictor
# 2 classes: 'benign', 'malignant'
# No pre-processing
# Resampling: Cross-Validated (10 fold, repeated 1 times)
# Summary of sample sizes: 615, 614, 614, 615, 615, 615, ...
# Resampling results across tuning parameters:
# mtry ROC Sens Spec Accuracy Kappa
# 2 0.9933176 0.9706566 0.9833333 0.9751492 0.9460717
# 5 0.9920034 0.9662121 0.9791667 0.9707801 0.9363708
# 9 0.9914825 0.9684343 0.9708333 0.9693308 0.9327662
# ROC was used to select the optimal model using the largest value.
# The final value used for the model was mtry = 2.
## Elements of sbf_roc fit
# [1] "method" "modelInfo" "modelType" "results" "pred"
# [6] "bestTune" "call" "dots" "metric" "control"
# [11] "finalModel" "preProcess" "trainingData" "resample" "resampledCM"
# [16] "perfNames" "maximize" "yLimits" "times" "levels"
## sbf_roc fit final Model
# Call:
# randomForest(x = x, y = y, mtry = param$mtry)
# Type of random forest: classification
# Number of trees: 500
# No. of variables tried at each split: 2
# OOB estimate of error rate: 2.34%
# Confusion matrix:
# benign malignant class.error
# benign 431 13 0.02927928
# malignant 3 236 0.01255230
## sbf_roc metric
# [1] "ROC"
## sbf_roc fit best Tune
# mtry
# 1 2
Есть ли sbf()
использование metric
аргумент для оптимизации модели? Если да, то что metric
делает sbf()
использовать по умолчанию? Если sbf()
использования metric
аргумент, то как установить его ROC
1 ответ
doesn't use the metric to optimize anything (unlike rfe
); все sbf
does is do a feature selection step before calling the model. Of course, you define the filters but there is no way to tune the filter using sbf
so no metric is needed to guide that step.
С помощью sbf(x, y, metric = "ROC")
пройдет metric = "ROC"
to whatever modeling function that you are using (and it designed to work with train
когда caretSBF
используется. This happens because there is no metric
аргумент sbf
> names(formals(caret:::sbf.default))
[1] "x" "y" "sbfControl" "..."