名称(x)<-值中的gam函数错误-值:“名称”属性的长度必须与向量的长度相同 [英] Error in gam function in names(x) <- value: 'names' attribute must be the same length as the vector

查看:1351
本文介绍了名称(x)<-值中的gam函数错误-值:“名称”属性的长度必须与向量的长度相同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 mgcv 软件包根据一些环境协变量对臭氧污染浓度进行建模。模型采用以下形式:

I am using the mgcv package to model the ozone pollution concentration according to some environmental covariates. The model takes the form :

model1 <- gam(O3 ~ s(X, Y, bs = "tp", k = 10) + wd + s(date, bs = "cc", k = 100) + district,
              data = mydata, family = gaussian(link ="log"),
              na.action = "na.omit", method = "REML")

这是协变量的结构:

> str(mydata)
'data.frame': 7100 obs. of  286 variables:
 $ date            : Date, format: "2016-01-01" "2016-01-01" "2016-01-01" ...
 $ O3              : num  0.0141 0.0149 0.0102 0.0159 0.0186 ...
 $ district        : Factor w/ 10 levels "bc","bh","dl",..: 1 8 7 8 2 6 4 4 10 2 ...
 $ wd              : Factor w/ 16 levels "E","ENE","ESE",..: 13 13 13 13 13 2 9 9 11 13 ...
 $ X               : num  0.389 0.365 1 0.44 0.892 ...
 $ Y               : num  0.311 0.204 0.426 0.223 0.162 ...

我被困在


R中的错误:名称属性[1]的长度必须与向量[0]相同。

error in R: 'names' attribute [1] must be the same length as the vector [0].

我尝试通过删除 s(date,bs = cc,k = 100),效果很好。日期字段似乎有问题。

I try to find where the problem is by delete the term of s(date, bs = "cc", k = 100) from the fomular and it could work well. It seems like there is something wrong with date field.

我不确定如何解决此问题。任何建议将不胜感激

I'm not exactly sure how to fix this problem. Any advice would be greatly appreciated!

推荐答案

日期变量不会自动转换为数字变量;您需要自己做。我通常按​​如下方式处理此类信息

The date variable won't be automatically converted to a numeric variable; you need to do this yourself. I normally process such information as follows

mydata <- transform(mydata, ndate = as.numeric(date),
                    nyear  = as.numeric(format(date, '%Y')),
                    nmonth = as.numeric(format(date, '%m')),
                    doy    = as.numeric(format(date, '%j')))

然后我可以选择在多种方法:

Then I can choose to model the time component in a number of ways:


  1. 基于 ndate 的趋势> nyear 具有非循环样条,或

  2. 基于 nmonth 或<$ c的循环模式$ c> doy (一年中的某天),或

  3. 趋势和循环模式的组合

  1. trend based on ndate of nyear with a non-cyclic spline, or
  2. cyclic pattern based on nmonth or doy (for day of year), or
  3. a combination of trend and cyclic pattern

不清楚您的数据是否仅限于一年。如果数据跨越多年,则不能只使用 ndate 变量上的循环样条。您将需要一个非常复杂的标准样条线(选项1),或者包括两个样条线,一个用于年间部分,一个用于年内部分(选项3)。

It's unclear from your question if your data are restricted to a single year. If the data span multiple years then you can't just use the cyclic spline on the ndate variable. You will need either a very complex standard spline (option 1) or include two splines, one for the between year part and one for the within year part (option 3.)

如果您的数据是多年数据,那么我将模型设置为

If your data is over multiple years then I would set the model up as

O3 ~ s(X, Y, bs = "tp", k = 10) + wd + s(doy, bs = 'cc', k = 20) +
     s(ndate, bs = "tp", k = 50) + district

或也许 s(nyear,....)足以代替 s(ndate,....)

这种时间分量的分解非常有用,因为您通常可以通过两个简单的,估计良好的平滑比单个更复杂的平滑更好地拟合该系列。

This kind of decomposition of the time component is useful as you can often do a better job of fitting the series via two simple, well-estimated smooths than a single more complex smooth. It also allows you to test for within and between year effects.

如果需要季节性周期随趋势变化,则张量积会很有用:

If you need the seasonal cycle to vary with the trend, then a tensor product is helpful:

O3 ~ s(X, Y, bs = "tp", k = 10) + wd +
     te(doy, ndate, bs = c('cc','tp'), k = c(20,50)) + district

对于循环样条曲线,您可能还需要设置结点参数,尤其是如果您的数据不能完全覆盖一年中的所有天数等时。 doy 我会使用 knots = list(doy = c(0.5,366.5)),因为这允许12月31日和1月第一个估计值略有不同。对于 nmonth 来说,这一点更为重要,因为12月和1月将获得相同的拟合值。我使用:结= list(nmonth = c(0.5,12.5))

For cyclic splines you may also want to set the knots argument, especially if your data don't quite span the full range of days of year etc. For doy I would use knots = list(doy = c(0.5, 366.5)) as this allows Dec 31st and Jan 1st to have slightly different estimated values. For nmonth this is more important as otherwise Dec and Jan would get the same fitted value. I use: knots = list(nmonth = c(0.5, 12.5)).

这里的想法是 1 12 反映相应月份的中间值,而 0.5 12.5 在头一个月和最后一个月的开始和结束,我们可能希望它们是相同的。

The idea here is that 1 and 12 reflect the middle of the respective month and 0.5 and 12.5 the beginning and end of the first and last months, which we might expect to be the same.

这篇关于名称(x)&lt;-值中的gam函数错误-值:“名称”属性的长度必须与向量的长度相同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆