Python数据框的置信区间 [英] Confidence Interval in Python dataframe
本文介绍了Python数据框的置信区间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试在大型数据集中计算力"列的平均值和置信区间(95%).我需要通过分组不同的类"来使用groupby函数的结果.
I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by grouping different "Classes".
当我计算平均值并将其放入新的数据框中时,它为我提供了所有行的NaN值.我不确定我是否要使用正确的方法.有没有更简单的方法可以做到这一点?
When I calculate the mean and put it in the new dataframe, it gives me NaN values for all rows. I'm not sure if I'm going the correct way. Is there any easier way to do this?
这是示例数据框:
df=pd.DataFrame({ 'Class': ['A1','A1','A1','A2','A3','A3'],
'Force': [50,150,100,120,140,160] },
columns=['Class', 'Force'])
要计算置信区间,我要做的第一步是计算均值.这就是我用的:
To calculate the confidence interval, the first step I did was to calculate the mean. This is what I used:
F1_Mean = df.groupby(['Class'])['Force'].mean()
这为我所有行提供了NaN
个值.
This gave me NaN
values for all rows.
推荐答案
import pandas as pd
import numpy as np
import math
df=pd.DataFrame({'Class': ['A1','A1','A1','A2','A3','A3'],
'Force': [50,150,100,120,140,160] },
columns=['Class', 'Force'])
print(df)
print('-'*30)
stats = df.groupby(['Class'])['Force'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)
ci95_hi = []
ci95_lo = []
for i in stats.index:
m, c, s = stats.loc[i]
ci95_hi.append(m + 1.96*s/math.sqrt(c))
ci95_lo.append(m - 1.96*s/math.sqrt(c))
stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)
输出为
Class Force
0 A1 50
1 A1 150
2 A1 100
3 A2 120
4 A3 140
5 A3 160
------------------------------
mean count std
Class
A1 100 3 50.000000
A2 120 1 NaN
A3 150 2 14.142136
------------------------------
mean count std ci95_hi ci95_lo
Class
A1 100 3 50.000000 156.580326 43.419674
A2 120 1 NaN NaN NaN
A3 150 2 14.142136 169.600000 130.400000
这篇关于Python数据框的置信区间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文