如何手动设置线性模型中变量的系数? [英] How to manually set coefficients for variables in linear model?

查看:133
本文介绍了如何手动设置线性模型中变量的系数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R中,如何设置特定变量的权重,而不是lm()函数中的观察值?

In R, how can I set weights for particular variables and not observations in lm() function?

上下文如下.我正在尝试为特定产品(例如手机)建立个人排名系统.我可以基于价格将线性模型作为因变量,并将其他功能(例如屏幕尺寸,内存,操作系统等)作为自变量来构建线性模型.然后,我可以用它来预测电话的实际成本(而不是标价),从而找到最佳的价格/优劣系数.这就是我已经做过的.

Context is as follows. I'm trying to build personal ranking system for particular products, say, for phones. I can build linear model based on price as dependent variable and other features such as screen size, memory, OS and so on as independent variables. I can then use it to predict phone real cost (as opposed to declared price), thus finding best price/goodness coefficient. This is what I have already done.

现在,我想突出显示"一些仅对我重要的功能.例如,我可能需要一部具有大内存的手机,因此我想给它更大的权重,以便针对内存变量优化线性模型.

Now I want to "highlight" some features that are important for me only. For example, I may need a phone with large memory, thus I want to give it higher weight so that linear model is optimized for memory variable.

lm()函数具有weights参数,但是这些是观察值的权重,而不是变量的权重(如果这是错误的,请更正我).我也尝试使用公式,但是只得到了解释器错误.有没有办法为lm()中的变量合并权重?

lm() function in R has weights parameter, but these are weights for observations and not variables (correct me if this is wrong). I also tried to play around with formula, but got only interpreter errors. Is there a way to incorporate weights for variables in lm()?

当然,lm()函数不是唯一的选择.如果您知道如何使用其他类似的解决方案(例如glm())来做到这一点,那也很好.

Of course, lm() function is not the only option. If you know how to do it with other similar solutions (e.g. glm()), this is pretty fine too.

UPD.经过几句评论,我了解到我对问题的思考方式是错误的.通过调用lm()获得的线性模型为训练示例提供了最佳系数,并且没有办法(也不需要)更改变量的权重,对此我感到困惑.我实际上正在寻找的是在现有线性模型中更改系数的方法,以手动使某些参数比其他参数更重要.继续前面的示例,假设我们有以下价格公式:

UPD. After few comments I understood that the way I was thinking about the problem is wrong. Linear model, obtained by call to lm(), gives optimal coefficients for training examples, and there's no way (and no need) to change weights of variables, sorry for confusion I made. What I'm actually looking for is the way to change coefficients in existing linear model to manually make some parameters more important than others. Continuing previous example, let's say we've got following formula for price:

price = 300 + 30 * memory + 56 * screen_size + 12 * os_android + 9 * os_win8

此公式描述了价格和电话参数之间相关性的最佳可能线性模型.但是,现在我想手动将memory变量前面的数字30更改为60,因此它变为:

This formula describes best possible linear model for dependence between price and phone parameters. However, now I want to manually change number 30 in front of memory variable to, say, 60, so it becomes:

price = 300 + 60 * memory + 56 * screen_size + 12 * os_android + 9 * os_win8

当然,此公式不再反映价格和电话参数之间的最佳关系.考虑到记忆对我来说比对普通人重要两倍(基于第一个公式的系数),因变量也不会显示实际价格,而只是显示一定的价值.但是,这种价值(或更准确地说,是分数goodness/price的值)正是我所需要的-有了这个,我可以找到价格最优惠的最好的手机(以我的观点)....................................................................................................................-.-....................-.

Of course, this formula doesn't reflect optimal relationship between price and phone parameters any more. Also dependent variable doesn't show actual price, just some value of goodness, taking into account that memory is twice more important for me than for average person (based on coefficients from first formula). But this value of goodness (or, more precisely, value of fraction goodness/price) is just what I need - having this I can find best (in my opinion) phone with best price.

希望所有这些都是有道理的.现在我有一个(可能非常简单)的问题.如何在现有线性模型中手动设置系数(通过lm()获得)?也就是说,我正在寻找类似的东西:

Hope all of this makes sense. Now I have one (probably very simple) question. How can I manually set coefficients in existing linear model, obtained with lm()? That is, I'm looking for something like:

coef(model)[2] <- 60

该代码当然行不通,但是您应该明白这一点.注意:显然可以在数据框中的memory列中将值加倍,但是我正在寻找更优雅的解决方案,从而影响模型,而不是数据.

This code doesn't work of course, but you should get the idea. Note: it is obviously possible to just double values in memory column in data frame, but I'm looking for more elegant solution, affecting model, not data.

推荐答案

以下代码有点复杂,因为lm() 最小化了残差平方和,并且具有固定的非最佳系数.不需要最小,因此这与lm()试图做的事情背道而驰,唯一的方法是也固定所有其余系数.

The following code is a bit complicated because lm() minimizes residual sum of squares and with a fixed, non optimal coefficient it is no longed minimal, so that would be against what lm() is trying to do and the only way is to fix all the rest coefficients too.

要做到这一点,我们必须首先知道无限制模型的系数.所有调整都必须通过更改模型的公式来完成,例如我们有 price ~ memory + screen_size,当然还有一个隐藏的拦截.现在,既不直接更改数据也不使用I(c*memory)是个好主意. I(c*memory)就像临时更改数据一样,但是仅通过变换变量来更改一个系数将更加困难.

To do that, we have to know coefficients of the unrestricted model first. All the adjustments have to be done by changing formula of your model, e.g. we have price ~ memory + screen_size, and of course there is a hidden intercept. Now neither changing the data directly nor using I(c*memory) is good idea. I(c*memory) is like temporary change of data too, but to change only one coefficient by transforming the variables would be much more difficult.

所以首先我们将price ~ memory + screen_size更改为price ~ offset(c1*memory) + offset(c2*screen_size).但是我们还没有修改截距,现在截距将尝试使残差平方和最小化,并且可能与原始模型有所不同.最后一步是删除拦截,并添加一个新的伪造的变量,即与其他变量具有相同数量的观察值:

So first we change price ~ memory + screen_size to price ~ offset(c1*memory) + offset(c2*screen_size). But we haven't modified the intercept, which now would try to minimize residual sum of squares and possibly become different than in original model. The final step is to remove the intercept and to add a new, fake variable, i.e. which has the same number of observations as other variables:

price ~ offset(c1*memory) + offset(c2*screen_size) + rep(c0, length(memory)) - 1

# Function to fix coefficients
setCoeffs <- function(frml, weights, len){
  el <- paste0("offset(", weights[-1], "*", 
               unlist(strsplit(as.character(frml)[-(1:2)], " +\\+ +")), ")")
  el <- c(paste0("offset(rep(", weights[1], ",", len, "))"), el)                                 
  as.formula(paste(as.character(frml)[2], "~", 
                   paste(el, collapse = " + "), " + -1"))
}
# Example data
df <- data.frame(x1 = rnorm(10), x2 = rnorm(10, sd = 5), 
                 y = rnorm(10, mean = 3, sd = 10))
# Writing formula explicitly 
frml <- y ~ x1 + x2
# Basic model
mod <- lm(frml, data = df)
# Prime coefficients and any modifications. Note that "weights" contains 
# intercept value too
weights <- mod$coef
# Setting coefficient of x1. All the rest remain the same
weights[2] <- 3
# Final model
mod2 <- update(mod, setCoeffs(frml, weights, nrow(df)))
# It is fine that mod2 returns "No coefficients"

此外,可能您仅将mod2用于预测(实际上我不知道现在可以在其他地方使用它),这样就可以以更简单的方式进行操作,而无需setCoeffs:

Also, probably you are going to use mod2 only for forecasting (actually I don't know where else it could be used now) so that could be made in a simpler way, without setCoeffs:

# Data for forecasting with e.g. price unknown
df2 <- data.frame(x1 = rpois(10, 10), x2 = rpois(5, 5), y = NA)
mat <- model.matrix(frml, model.frame(frml, df2, na.action = NULL))
# Forecasts
rowSums(t(t(mat) * weights))

这篇关于如何手动设置线性模型中变量的系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆