R:在nlsLM()语句中进行汇总 [英] R: Summarize inside an nlsLM() statement

查看:234
本文介绍了R:在nlsLM()语句中进行汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 nlsLM()创建幂函数的模型,但是我需要在函数调用中汇总我的数据以找到合适的系数和指数。更具体地说,这是我的模型代码:

I'm using nlsLM() to make a model of a power function, but I need to summarize my data within the function call to find the appropriate coefficient and exponent. More specifically, here is what my model code looks like:

Jmod = nlsLM(value~(a)*summarise(funs(mean), (MW)^b),
                 start = list(a=100000, b = 1/3), data = mod_data,
                 upper = c(Inf,1), lower = c(0,1/5))

其中 MW 是我要用来预测的数据。 MW 的数据已经根据名为 datetime 的变量按月分组,所以我想采用每月 MW ^ b 的平均值,其中 b nlsLM()语句。

where MW is the data I am trying to use to predict value. The data for MW is already grouped by month based off a variable called datetime, so I would like to take the monthly average of MW^b where b is found by the nlsLM() statement.

我会事先取平均值,但是您可能会意识到,这在数学上并不等效[即((a + c)/ 2)^ b不等于(a ^ b + c ^ b)/ 2]。

I would take the average beforehand, but as you may realize, this is not mathematically equivalent [ie. ((a+c)/2)^b is not equal to (a^b + c^b)/2].

如果有人对此有任何了解,我将非常感激!

If anyone has any info on how to do this, I would be greatly appreciative!

编辑:

下面是用于创建我要使用的样本数据集的代码:

Below is the code to make a sample data set of what I'm trying to work with:

structure(list(datetime = structure(c(1514782800, 1480568400,1504242000, 1509512400, 1509512400, 1485925200, 1517461200, 1485925200, 1501563600, 1467349200, 1472706000, 1454302800, 1483246800, 1498885200, 1506834000, 1477976400, 1483246800, 1477976400, 1509512400, 1496293200, 1451624400, 1454302800, 1454302800, 1464757200, 1498885200, 1517461200, 1462078800, 1506834000, 1522558800, 1483246800, 1501563600, 1451624400, 1485925200, 1501563600, 1451624400, 1517461200, 1475298000, 1480568400, 1512104400, 1456808400, 1477976400, 1475298000, 1517461200, 1459486800, 1501563600, 1477976400, 1506834000, 1506834000, 1451624400, 1483246800), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(2863.27837518519, 2878.40382333333, 1236.74444444444, 3522.48888888889, 3522.48888888889, 2033.55555555556, 3305.5, 2033.55555555556, 2094.7037037037, 3052.91875740741, 2960.52222222222, 1733.7918262963, 2850.28673851852, 2841.40740740741, 3310.77538814815, 2266.26172851852, 2850.28673851852, 2266.26172851852, 3522.48888888889, 2802.55555555556, 2196.82556740741, 1733.7918262963, 1733.7918262963, 3001.43703703704, 2841.40740740741, 3305.5, 2061.4826762963, 3310.77538814815, 3107.01851851852, 2850.28673851852, 2094.7037037037, 2196.82556740741, 2033.55555555556, 2094.7037037037, 2196.82556740741, 3305.5, 2848.90322592593, 2878.40382333333, 2873.73476703704, 2208.64755074074, 2266.2172851852, 2848.90322592593, 3305.5, 2021.68765444444, 2094.7037037037, 2266.26172851852, 3310.77538814815, 3310.77538814815, 2196.82556740741, 2850.28673851852), mon = structure(c(2018, 2016.91666666667, 2017.66666666667, 2017.83333333333, 2017.83333333333, 2017.08333333333, 2018.08333333333, 2017.08333333333, 2017.58333333333, 2016.5, 2016.66666666667, 2016.08333333333, 2017, 2017.5, 2017.75, 2016.83333333333, 2017, 2016.83333333333, 2017.83333333333, 2017.41666666667, 2016, 2016.08333333333, 2016.08333333333, 2016.41666666667, 2017.5, 2018.08333333333, 2016.33333333333, 2017.75, 2018.25, 2017, 2017.58333333333, 2016, 2017.08333333333, 2017.58333333333, 2016, 2018.08333333333, 2016.75, 2016.91666666667, 2017.91666666667, 2016.16666666667, 2016.83333333333, 2016.75, 2018.08333333333, 2016.25,2017.58333333333,2016.83333333333, 2017.75, 2017.75, 2016, 2017), class = "yearmon"), 
MW = c(2.6142700774997, 4.02670249993547, 0.666666666666667, 
0.724114015571947, 4.07197668868287, 3.74122386862433, 3.30097429092907, 
3.84858110028323, 0.666666666666667, 0.666666666666667, 4.35000446878457, 
0.666666666666667, 0.666666666666667, 3.8371824280444, 0.825077317374, 
0.666666666666667, 4.028058457579, 0.666666666666667, 4.3378032532779, 
3.84270845997837, 1.40955788986009, 0.666666666666667, 0.666666666666667, 
4.05845600900597, 4.00664052392117, 4.0295346724872, 0.666666666666667, 
4.14159923664523, 4.231951299842, 3.9562222817766, 0.666666666666667, 
3.61602795165213, 0.666666666666667, 3.58079262746603, 4.12197770915903, 
4.2610646492437, 4.02152528469467, 1.0117763092792, 2.03648922832252, 
0.666666666666667, 0.666666666666667, 3.8042476910097, 3.91787334748133, 
0.666666666666667, 0.666666666666667, 0.89571472289964, 4.1530002677697, 
3.93733212731873, 0.710671314318797, 0.666666666666667)), .Names = c("datetime", "value", "mon", "MW"), row.names = c(39113L, 12946L, 4365L, 37505L, 36601L, 31055L, 39814L, 31433L, 32105L, 20668L, 18191L, 8328L, 10232L, 25689L, 35528L, 4577L, 10302L, 5146L, 37975L, 29670L, 28429L, 7932L, 8468L, 23120L, 25111L, 39699L, 24312L, 36246L, 1556L, 11068L, 33269L, 29163L, 31685L, 32419L, 29059L, 40618L, 16751L, 11737L, 34371L, 6001L, 4864L, 16413L, 40304L, 8716L, 33190L, 5399L, 35610L, 36462L, 28338L, 10371L), class = "data.frame")

这将创建我用来制作模型的 mod_data 。重申一下,我在这里所做的是按月分组数据,在我的数据的 mon 列中找到,现在我想按月汇总数据,但是如上面的代码所示,均值中包含指数。再次感谢!

This will create the mod_data that I am using to make the model. And just to reiterate, what I have done here is group data by month, found in the mon column of my data, and now I want to summarize the data by month, but with the exponent included in the mean, as seen in my above code. Thanks again!

推荐答案

问题中不清楚您想要什么。您是否要为数据中的每个唯一月份拟合一个单独的模型?还是要为所有数据拟合一个模型,然后取 MW ^ b 的月平均值?

What you want isn't clear from the question. Do you want to fit a separate model for every unique month in your data? Or do you want to fit one model for all of the data and then take monthly averages of the value of MW^b ?

这是处理后一种情况的一种方法。

Here's one way on how to do the latter case.

require(minpack.lm)
require(tidyverse)
require(broom)

dat <- structure(...) # provided in the question

predictions <- 
    dat %>% 
    ungroup %>%
    mutate(row = row_number()) %>%
    do(augment(nlsLM(
                formula = value ~ a * MW^b + 0*row, 
                data = .,
                start = list(a = 100000, b=1/3),
                upper = c(Inf, 1), 
                lower = c(0, 1/5)
               )
           )
       )

joined <- 
    dat %>%
    mutate(row = row_number()) %>%
    left_join(predictions, by=c('MW', 'value', 'row')) %>%
    select(-row)

joined %>%
    group_by(mon) %>%
    mutate(monthly_avg_prediction = mean(.fitted))

注意:


  1. Th使用扫帚,这种东西要容易得多包。这是因为broom会转换模型查找功能的结果,例如 lm nls nlsLM 等转换为数据帧。因此,您不必记住或重新查找模型对象的特定于函数的结构(例如 model $ params [['estimate']] [[1]])或类似的东西;模型结果已经是R标准数据框格式。

  2. 我的解决方案使用了此StackOverflow答案中的想法,有关如何加入由扫帚生成的内容带有原始数据的预测数据框。这就是为什么有 row_number() left_join()的东西在那里的原因。否则,在一般情况下, augment 会丢弃模型预测中未使用的原始数据帧中的数据,并且如果其中重复的值将无法正常工作

  3. .fitted 列是由扫帚的 augment 功能。这是在指定数据点的模型预测。

  4. 结果(我认为您可能想要)在 monthly_avg_prediction 列中已加入数据框。但这代表了一个单一的全局模型,适用于所有数据,并且该模型的预测按月平均。

  1. This kind of stuff is much easier with the broom package. This is because broom converts the results of model-finding functions like lm, nls, or nlsLM etc. into dataframes. So you don't have to memorize or re-lookup the function-specific structure of the model object (e.g. model$params[['estimate']][[1]]) or similar stuff; the model results are already in R-standard dataframe format.
  2. My solution uses ideas from this StackOverflow answer on how to join broom-generated dataframes of predictions with original data. That's why the stuff with row_number() and left_join() are in there. Otherwise, in the general case, augment will throw away data from the original dataframe that is not used in the model prediction, and it will not work well if there are repeated values in the data that is used.
  3. The .fitted column is generated by broom's augment function. It is the model prediction at the indicated datapoint.
  4. The result (that I think you might want) is in the monthly_avg_prediction column of the joined dataframe. But that represents a single, global model, fit on all the data, and predictions from that model are averaged by month.

这篇关于R:在nlsLM()语句中进行汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆