Почему функция Предиката () не принимает работу "новые данные" для моей модели lmer?
Когда я пытаюсь использовать Предикат (), или ПредикатИнтервал (), или bootMer(), предоставляя ему новые данные (даже если это те же данные, с которыми я построил модель), я получаю следующую ошибку:
Error in [.data.frame(fr, vars) : undefined columns selected
Я использовал пакет tidyverse, поэтому подумал, что это может быть связано с тибблом, но преобразование в data.frame (с использованием as.data.frame) не решает проблему.
Я также попытался изменить имена столбцов, чтобы они не содержали пробелов, и избавился от NA в данных.
Вот пример:
library(tidyverse)
library(lme4)
library(lattice)
library(merTools)
df = read_csv(file = "~/Documents/Experiments/Myles Study/mylesdata.csv")
names(df)[names(df) == "h3"] = "hr3"
spms = df %>%
dplyr::select(
-c(
mets1, mets2, mets3, mets4, mets5, mets6
, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
, hr1, hr2, hr3, hr4, hr5, hr6
, sl1, sl2, sl3, sl4, sl5, sl6
)
) %>%
gather(speed_lev, spm, spm1, spm2, spm3, spm4, spm5, spm6)
spms$speed_lev = factor(spms$speed_lev)
levels(spms$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")
mets = df %>%
dplyr::select(
-c(
spm1, spm2, spm3, spm4, spm5, spm6
, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
, hr1, hr2, hr3, hr4, hr5, hr6
, sl1, sl2, sl3, sl4, sl5, sl6
)
) %>%
gather(speed_lev, mets, mets1, mets2, mets3, mets4, mets5, mets6)
mets$speed_lev = factor(mets$speed_lev)
levels(mets$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")
rpe = df %>%
dplyr::select(
-c(
spm1, spm2, spm3, spm4, spm5, spm6
, mets1, mets2, mets3, mets4, mets5, mets6
, hr1, hr2, hr3, hr4, hr5, hr6
, sl1, sl2, sl3, sl4, sl5, sl6
)
) %>%
gather(speed_lev, rpe, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6)
rpe$speed_lev = factor(rpe$speed_lev)
levels(rpe$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")
hr = df %>%
dplyr::select(
-c(
spm1, spm2, spm3, spm4, spm5, spm6
, mets1, mets2, mets3, mets4, mets5, mets6
, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
, sl1, sl2, sl3, sl4, sl5, sl6
)
) %>%
gather(speed_lev, hr, hr1, hr2, hr3, hr4, hr5, hr6)
hr$speed_lev = factor(hr$speed_lev)
levels(hr$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")
sl = df %>%
dplyr::select(
-c(
spm1, spm2, spm3, spm4, spm5, spm6
, mets1, mets2, mets3, mets4, mets5, mets6
, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
, hr1, hr2, hr3, hr4, hr5, hr6
)
) %>%
gather(speed_lev, sl, sl1, sl2, sl3, sl4, sl5, sl6)
sl$speed_lev = factor(sl$speed_lev)
levels(sl$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")
dat = left_join(spms, mets)
dat = left_join(dat, rpe)
dat = left_join(dat, hr)
dat = left_join(dat, sl)
names(dat)[names(dat) == "speed_lev"] = "speed (m/h)"
dat$`speed (m/h)` = as.numeric(as.character(dat$`speed (m/h)`))
dat$mets_sq = dat$mets^2
dat$mets_cubed = dat$mets^3
m3 = lmer(spm ~ `leg length (cm)` + mets_cubed + mets_sq + mets + (1 | subject), data = dat)
summary(m3)
Резюме:
Linear mixed model fit by REML ['lmerMod']
Formula: spm ~ `leg length (cm)` + mets_cubed + mets_sq + mets + (1 | subject)
Data: dat
REML criterion at convergence: 1562.6
Scaled residuals:
Min 1Q Median 3Q Max
-2.8094 -0.5213 0.1120 0.5638 3.0933
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 49.41 7.029
Residual 26.90 5.186
Number of obs: 238, groups: subject, 43
Fixed effects:
Estimate Std. Error t value
(Intercept) 41.30017 18.13968 2.277
`leg length (cm)` -0.58471 0.16535 -3.536
mets_cubed 0.44777 0.09343 4.793
mets_sq -8.67702 1.37655 -6.303
mets 60.66036 6.36756 9.526
Correlation of Fixed Effects:
(Intr) `l(c)` mts_cb mts_sq
`lglng(cm)` -0.863
mets_cubed -0.434 -0.045
mets_sq 0.449 0.045 -0.994
mets -0.459 -0.045 0.974 -0.993
Затем я пытаюсь делать прогнозы:
predict(m3, dat)
predictInterval(
m3
, dat
, level = 0.95
, n.sims = 1000
, stat = "median"
, type="linear.prediction"
, include.resid.var = TRUE
)
И получите ошибки:
Error in `[.data.frame`(fr, vars) : undefined columns selected
Error in `[.data.frame`(fr, vars) : undefined columns selected
In addition: Warning message:
In predictInterval(m3, dat, level = 0.95, n.sims = 1000, stat = "median", :
newdata is tbl_df or tbl object from dplyr package and has been
coerced to a data.frame
Информация о сессии:
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] merTools_0.3.0 arm_1.9-3 MASS_7.3-45 lattice_0.20-34 lme4_1.1-12 Matrix_1.2-7.1 dplyr_0.5.0
[8] purrr_0.2.2 readr_1.0.0 tidyr_0.6.0 tibble_1.2 ggplot2_2.1.0 tidyverse_1.0.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 nloptr_1.0.4 plyr_1.8.4 base64enc_0.1-3 tools_3.3.2 digest_0.6.10
[7] jsonlite_1.1 evaluate_0.10 blme_1.0-4 nlme_3.1-128 gtable_0.2.0 psych_1.6.9
[13] shiny_1.0.0 DBI_0.5 yaml_2.1.14 parallel_3.3.2 mvtnorm_1.0-5 coda_0.18-1
[19] stringr_1.1.0 knitr_1.15 htmlwidgets_0.8 grid_3.3.2 DT_0.2 R6_2.1.3
[25] rmarkdown_1.1 foreign_0.8-67 minqa_1.2.4 reshape2_1.4.1 magrittr_1.5 scales_0.4.0
[31] htmltools_0.3.5 splines_3.3.2 assertthat_0.1 abind_1.4-5 mnormt_1.5-5 xtable_1.8-2
[37] mime_0.5 colorspace_1.2-6 httpuv_1.3.3 labeling_0.3 stringi_1.1.1 lazyeval_0.2.0
[43] munsell_0.4.3 broom_0.4.1
1 ответ
Подозрение Роланда было верным. Пакет Tidyverse (экосистема) маскирует интересующие функции предсказания ().