使用Sklearn指标的 pandas 相关误差 [英] Pandas Correlation Error Using Sklearn Metrics
问题描述
plant_name月份年wind_speed_obs wind_speed_ms0大喇叭I 1 2018 5.143830 6.0124361大喇叭I 1 2019 4.556545 5.2318552大喇叭I 1 2020 6.582890 7.8665323大喇叭I 2 2018 7.904438 9.2488104大喇叭I 2 2019 4.353567 5.1156255大喇叭I 2 2020 7.376739 8.4080466大喇叭I 3 2018 6.138197 6.9220437大喇叭I 3 2019 3.881804 4.4842748大喇叭I 3 2020 7.071029 7.3471779大喇叭I 4 2018 7.106936 7.69986110大喇叭I 4 2019 6.874942 7.57527811大喇叭I 4 2020 6.855979 7.10625012大喇叭I 5 2018 5.366054 6.51075313大喇叭I 5 2019 5.657342 6.59758114大喇叭I 5 2020 7.010745 7.24704315大喇叭I 6 2018 6.399417 7.07652816大喇叭I 6 2019 6.578241 7.55611117大喇叭I 6 2020 7.120105 7.54819418大喇叭I 7 2018 5.615110 6.12392519大喇叭I 7 2019 6.212104 6.96344120大喇叭I 7 2020 6.663250 6.97231221大喇叭I 8 2018 5.303967 5.94731222大喇叭I 8 2019 5.176691 6.20927423大喇叭I 8 2020 6.093748 6.33763424大喇叭I 9 2018 5.375531 5.87847225大喇叭I 9 2019 6.126961 6.79250026大喇叭I 9 2020 5.608530 6.02805627大喇叭I 10 2018 4.466079 5.05497328大喇叭I 10 2019 5.492795 6.32607529大喇叭I 10 2020 7.103278 7.49207030大喇叭I 11 2018 5.341987 5.88902831大喇叭I 11 2019 4.887397 5.14402832大喇叭I 11 2020 6.718649 7.15000033大喇叭I 12 2018 5.099386 5.86693534大喇叭I 12 2019 3.925717 4.23414035大喇叭I 12 2020 5.589325 5.943145
这是我正在使用的代码:
从sklearn.metrics中的 导入r2_scoredef r2_rmse2(g):r2 = r2_score(g ['wind_speed_obs'],g ['wind_speed_ms'])#rmse = np.sqrt(mean_squared_error(g ['wind_speed_obs'],g ['wind_speed_ms'])))返回pd.Series(dict(r2 = r2))data1.groupby(['plant_name','month']).apply(r2_rmse2).reset_index()
在上面应用r2_rmse2函数时,我得到了以下结果:
plant_name月r20大喇叭我1 -0.3147711大喇叭I 2 0.5298902大喇叭I 3 0.8040663大喇叭I 4 -22.1647204大喇叭I 5 -0.4606905大喇叭I 6 -4.6733596大喇叭I 7 -0.6621667大喇叭I 8 -2.1188158大喇叭我9 -1.9465669大喇叭I 10 0.66263610大喇叭I 11 0.69689611大喇叭I 12 0.446235
当我在Excel中测试应使用该功能的功能时,正确的结果是:
plant_name月r2大喇叭I 1 0.999975202大喇叭I 2 0.998459857大喇叭I 3 0.988712352大喇叭我4 0.711649414大喇叭I 5 0.998282523大喇叭I 6 0.681460011大喇叭我7 0.907152074大喇叭我8 0.66212225大喇叭我9 0.98807953大喇叭I 10 0.988469127大喇叭I 11 0.990836283大喇叭I 12 0.968629237
我不明白为什么使用功能的申请不正确.谢谢您的帮助.
以下是R平方,RMSE和 I am trying to calculate r2 or r-squared over a large dataset with pandas and grouping the data by plant_name and month in a dataframe like "data1" shown below. The problem is that when I use sklearn metrics and a defined function, I obtain a result that is not consistent with a result that I obtain using the same data in "data1" in Excel. Here is the data in "data1" Here is the code i'm using: I obtain this result when applying the r2_rmse2 function above: the correct result when i test the function in Excel that I should obtain applying the function is: I cannot understand why the apply using the fuction is incorrect. Thank you for your help. Here is the computation of R squared, RMSE and Pearson correlation coefficient (as used in Excel) on your data:
这篇关于使用Sklearn指标的 pandas 相关误差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!导入r2_score,mean_squared_error从scipy.stats导入pearsonrdef r2_rmse2(g):r2 = r2_score(g ['wind_speed_obs'],g ['wind_speed_ms'])rmse = mean_squared_error(g ['wind_speed_obs'],g ['wind_speed_ms'],squared = False)correl = pearsonr(g ['wind_speed_obs'],g ['wind_speed_ms'])[0]返回pd.Series(dict(r2 = r2,rmse = rmse,correl = correl))data1.groupby(['plant_name','month']).apply(r2_rmse2).reset_index()
plant_name month r2 rmse correl0大喇叭I 1 -0.314771 0.976090 0.9999751大喇叭I 2 0.529890 1.072639 0.9984602大喇叭I 3 0.804066 0.592633 0.9887123大喇叭I 4 -22.164844 0.549141 0.7116494大喇叭I 5 -0.460691 0.866068 0.9982835大喇叭I 6 -4.673359 0.729833 0.6814606大喇叭I 7 -0.662167 0.553450 0.9071527大喇叭I 8 -2.118817 0.716380 0.6621228大喇叭I 9 -1.946562 0.539102 0.9880809大喇叭I 10 0.662637 0.630426 0.98846910大喇叭I 11 0.696896 0.428632 0.99083611大喇叭I 12 0.446234 0.519437 0.968629
plant_name month year wind_speed_obs wind_speed_ms
0 BIG HORN I 1 2018 5.143830 6.012436
1 BIG HORN I 1 2019 4.556545 5.231855
2 BIG HORN I 1 2020 6.582890 7.866532
3 BIG HORN I 2 2018 7.904438 9.248810
4 BIG HORN I 2 2019 4.353567 5.115625
5 BIG HORN I 2 2020 7.376739 8.408046
6 BIG HORN I 3 2018 6.138197 6.922043
7 BIG HORN I 3 2019 3.881804 4.484274
8 BIG HORN I 3 2020 7.071029 7.347177
9 BIG HORN I 4 2018 7.106936 7.699861
10 BIG HORN I 4 2019 6.874942 7.575278
11 BIG HORN I 4 2020 6.855979 7.106250
12 BIG HORN I 5 2018 5.366054 6.510753
13 BIG HORN I 5 2019 5.657342 6.597581
14 BIG HORN I 5 2020 7.010745 7.247043
15 BIG HORN I 6 2018 6.399417 7.076528
16 BIG HORN I 6 2019 6.578241 7.556111
17 BIG HORN I 6 2020 7.120105 7.548194
18 BIG HORN I 7 2018 5.615110 6.123925
19 BIG HORN I 7 2019 6.212104 6.963441
20 BIG HORN I 7 2020 6.663250 6.972312
21 BIG HORN I 8 2018 5.303967 5.947312
22 BIG HORN I 8 2019 5.176691 6.209274
23 BIG HORN I 8 2020 6.093748 6.337634
24 BIG HORN I 9 2018 5.375531 5.878472
25 BIG HORN I 9 2019 6.126961 6.792500
26 BIG HORN I 9 2020 5.608530 6.028056
27 BIG HORN I 10 2018 4.466079 5.054973
28 BIG HORN I 10 2019 5.492795 6.326075
29 BIG HORN I 10 2020 7.103278 7.492070
30 BIG HORN I 11 2018 5.341987 5.889028
31 BIG HORN I 11 2019 4.887397 5.144028
32 BIG HORN I 11 2020 6.718649 7.150000
33 BIG HORN I 12 2018 5.099386 5.866935
34 BIG HORN I 12 2019 3.925717 4.234140
35 BIG HORN I 12 2020 5.589325 5.943145
from sklearn.metrics import r2_score
def r2_rmse2( g ):
r2 = r2_score( g['wind_speed_obs'], g['wind_speed_ms'] )
#rmse = np.sqrt( mean_squared_error( g['wind_speed_obs'], g['wind_speed_ms'] ) )
return pd.Series( dict( r2 = r2 ) )
data1.groupby( ['plant_name','month'] ).apply( r2_rmse2 ).reset_index()
plant_name month r2
0 BIG HORN I 1 -0.314771
1 BIG HORN I 2 0.529890
2 BIG HORN I 3 0.804066
3 BIG HORN I 4 -22.164720
4 BIG HORN I 5 -0.460690
5 BIG HORN I 6 -4.673359
6 BIG HORN I 7 -0.662166
7 BIG HORN I 8 -2.118815
8 BIG HORN I 9 -1.946566
9 BIG HORN I 10 0.662636
10 BIG HORN I 11 0.696896
11 BIG HORN I 12 0.446235
plant_name month r2
BIG HORN I 1 0.999975202
BIG HORN I 2 0.998459857
BIG HORN I 3 0.988712352
BIG HORN I 4 0.711649414
BIG HORN I 5 0.998282523
BIG HORN I 6 0.681460011
BIG HORN I 7 0.907152074
BIG HORN I 8 0.66212225
BIG HORN I 9 0.98807953
BIG HORN I 10 0.988469127
BIG HORN I 11 0.990836283
BIG HORN I 12 0.968629237
from sklearn.metrics import r2_score, mean_squared_error
from scipy.stats import pearsonr
def r2_rmse2(g):
r2 = r2_score(g['wind_speed_obs'], g['wind_speed_ms'])
rmse = mean_squared_error(g['wind_speed_obs'], g['wind_speed_ms'], squared=False)
correl = pearsonr(g['wind_speed_obs'], g['wind_speed_ms'])[0]
return pd.Series( dict( r2 = r2, rmse=rmse, correl=correl ) )
data1.groupby( ['plant_name','month'] ).apply( r2_rmse2 ).reset_index()
plant_name month r2 rmse correl
0 BIG HORN I 1 -0.314771 0.976090 0.999975
1 BIG HORN I 2 0.529890 1.072639 0.998460
2 BIG HORN I 3 0.804066 0.592633 0.988712
3 BIG HORN I 4 -22.164844 0.549141 0.711649
4 BIG HORN I 5 -0.460691 0.866068 0.998283
5 BIG HORN I 6 -4.673359 0.729833 0.681460
6 BIG HORN I 7 -0.662167 0.553450 0.907152
7 BIG HORN I 8 -2.118817 0.716380 0.662122
8 BIG HORN I 9 -1.946562 0.539102 0.988080
9 BIG HORN I 10 0.662637 0.630426 0.988469
10 BIG HORN I 11 0.696896 0.428632 0.990836
11 BIG HORN I 12 0.446234 0.519437 0.968629