如何获得趋势的标准化斜率 [英] How to get a normalised slope of a trend
问题描述
我正在分析社交网络中超过6周的用户到userx
的距离.
I am analysing the distances of users to userx
over 6 weeks in a social network.
注意:无路径"表示这两个用户尚未被连接(至少是由朋友的朋友).
Note: 'No path' means the two users are not conncted yet (at least by friends of friends).
week1 week2 week3 week4 week5 week6
user1 No path No path No path No path 3 1
user2 No path No path No path 5 3 1
user3 5 4 4 4 4 3
userN ...
我想看看用户与userx
的连接状况如何.
I want to see how well the users connect with userx
.
为此,我最初考虑使用回归斜率的值进行解释(即回归斜率越低越好).
For that I initially thought of using the value of regression slope for the interpretation (i.e. the low regression slope, the better it is).
例如;考虑user1
和user2
的回归斜率计算如下.
For example; consider user1
and user2
the regression slope of them are calculated as follows.
用户1:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X = [[5], [6]] #distance available only for week5 and week6
y = [3, 1]
regressor.fit(X, y)
print(regressor.coef_)
输出为-2.
用户2:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X = [[4], [5], [6]] #distance available only for week4, week5 and week6
y = [5, 3, 1]
regressor.fit(X, y)
print(regressor.coef_)
输出为-2.
如您所见,两个用户都获得相同的slope
值.但是,user2
在user1
之前一周已与userx
连接.因此,应该以某种方式授予user1
.
As you can see both the users get same slope
value. However, user2
has been connected with userx
a week before than user1
. Hence, user1
should be awarded someway.
因此,我想知道是否有更好的方法来计算我的问题.
Therefore, I am wondering if there is a better way of calculating my problem.
如果需要,我很乐意提供更多详细信息.
I am happy to provide more details if needed.
推荐答案
好吧,如果您想奖励连接持续时间,则可能需要花一些时间进行计算.最简单/最直接的方法就是将系数乘以时间:
Well, if you want to award for the duration of connection, you probably need to take time into calculations. The easiest/most straightforward way is just to multiply the coefficent by time:
outcome_measure <- regressor.coef_ * length(y)
如果将其除以2,则其在概念上将与曲线下的面积(AUC)相同:
And if you would divide it by 2 it will conceptually be the same as the area under the curve (AUC):
outcome_measure <- (regressor.coef_ * length(y))/2
因此,第一种方法将得到-4和-6,第二种方法将得到-2和-3.
So you would get -4 and -6 with the first method or -2 and -3 with the second.
有点偏离主题,但是如果您使用线性回归进行统计分析(不仅仅是为了获得系数),我可能会添加某种检查以确认其假设是正确的.
Slightly offtopic, but IF you use linear regression for statistical analysis (not just to get coefficent), I probably would add some kind of check to confirm that its assumptions are true.
这篇关于如何获得趋势的标准化斜率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!