在 pandas 行/回归线之间应用公式 [英] Apply formula across pandas rows/ regression line

查看:69
本文介绍了在 pandas 行/回归线之间应用公式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据框的行上应用公式,以获取行中数字的趋势.

I'm trying to apply a formula across the rows of a data frame to get the trend of the numbers in the rows.

下面的示例可以一直使用到使用.apply的部分.

The below example works until the part where .apply is used.

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(db.columns)+1))

def calc_slope(row):
    return scipy.stats.linregress(df.iloc[row,:], y=axisvalues)

calc_slope(1) # this works

df["New"]=df.apply(calc_slope,axis=1) # this fails *- "too many values to unpack"*

谢谢您的帮助

推荐答案

我认为您需要一个属性:

I think you need for one attribute:

def calc_slope(row):
    a = scipy.stats.linregress(row, y=axisvalues)
    return a.slope 

df["slope"]=df.apply(calc_slope,axis=1)
print (df)
          A         B         C         D     slope
0  0.444640  0.024624 -0.016216  0.228935 -2.553465
1  1.226611  1.962481  1.103834  0.645562 -1.455239
2 -0.259415  0.971097  0.124538 -0.704115 -0.718621
3  1.938422  1.787310 -0.619745 -2.560187 -0.575519
4 -0.986231 -1.942930  2.677379 -1.813071  0.075679
5  0.611214 -0.258453  0.053452  1.223544  0.841865
6  0.685435  0.962880 -1.517077 -0.101108 -0.652503
7  0.368278  1.314202  0.748189  2.116189  1.350132
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341
9  0.798461  0.461736 -0.665127 -0.247887 -1.610447

对于所有属性,将命名元组转换为dict,然后转换为Series.输出是新的DataFrame,因此如有必要 join 还原为原始图片:

And for all atributes convert named tuple to dict and then to Series. Output is new DataFrame, so if is necessaryjoin to original:

np.random.seed(1997)

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(df.columns)+1))

def calc_slope(row):
    a = scipy.stats.linregress(row, y=axisvalues)
    return pd.Series(a._asdict())

print (df.apply(calc_slope,axis=1))
      slope  intercept    rvalue    pvalue    stderr
0 -2.553465   2.935355 -0.419126  0.580874  3.911302
1 -1.455239   4.296670 -0.615324  0.384676  1.318236
2 -0.718621   2.523733 -0.395862  0.604138  1.178774
3 -0.575519   2.578530 -0.956682  0.043318  0.123843
4  0.075679   2.539066  0.127254  0.872746  0.417101
5  0.841865   2.156991  0.425333  0.574667  1.266674
6 -0.652503   2.504915 -0.561947  0.438053  0.679154
7  1.350132   0.965285  0.794704  0.205296  0.729193
8 -0.987341   1.647104 -0.593680  0.406320  0.946311
9 -1.610447   2.639780 -0.828856  0.171144  0.768641


df = df.join(df.apply(calc_slope,axis=1))
print (df)
          A         B         C         D     slope  intercept    rvalue  \
0  0.444640  0.024624 -0.016216  0.228935 -2.553465   2.935355 -0.419126   
1  1.226611  1.962481  1.103834  0.645562 -1.455239   4.296670 -0.615324   
2 -0.259415  0.971097  0.124538 -0.704115 -0.718621   2.523733 -0.395862   
3  1.938422  1.787310 -0.619745 -2.560187 -0.575519   2.578530 -0.956682   
4 -0.986231 -1.942930  2.677379 -1.813071  0.075679   2.539066  0.127254   
5  0.611214 -0.258453  0.053452  1.223544  0.841865   2.156991  0.425333   
6  0.685435  0.962880 -1.517077 -0.101108 -0.652503   2.504915 -0.561947   
7  0.368278  1.314202  0.748189  2.116189  1.350132   0.965285  0.794704   
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341   1.647104 -0.593680   
9  0.798461  0.461736 -0.665127 -0.247887 -1.610447   2.639780 -0.828856   

     pvalue    stderr  
0  0.580874  3.911302  
1  0.384676  1.318236  
2  0.604138  1.178774  
3  0.043318  0.123843  
4  0.872746  0.417101  
5  0.574667  1.266674  
6  0.438053  0.679154  
7  0.205296  0.729193  
8  0.406320  0.946311  
9  0.171144  0.768641 

这篇关于在 pandas 行/回归线之间应用公式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆