Python数据框的置信区间 [英] Confidence Interval in Python dataframe

查看:95
本文介绍了Python数据框的置信区间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在大型数据集中计算力"列的平均值和置信区间(95%).我需要通过分组不同的类"来使用groupby函数的结果.

I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by grouping different "Classes".

当我计算平均值并将其放入新的数据框中时,它为我提供了所有行的NaN值.我不确定我是否要使用正确的方法.有没有更简单的方法可以做到这一点?

When I calculate the mean and put it in the new dataframe, it gives me NaN values for all rows. I'm not sure if I'm going the correct way. Is there any easier way to do this?

这是示例数据框:

df=pd.DataFrame({ 'Class': ['A1','A1','A1','A2','A3','A3'], 
                  'Force': [50,150,100,120,140,160] },
                   columns=['Class', 'Force'])

要计算置信区间,我要做的第一步是计算均值.这就是我用的:

To calculate the confidence interval, the first step I did was to calculate the mean. This is what I used:

F1_Mean = df.groupby(['Class'])['Force'].mean()

这为我所有行提供了NaN个值.

This gave me NaN values for all rows.

推荐答案

import pandas as pd
import numpy as np
import math

df=pd.DataFrame({'Class': ['A1','A1','A1','A2','A3','A3'], 
                 'Force': [50,150,100,120,140,160] },
                 columns=['Class', 'Force'])
print(df)
print('-'*30)

stats = df.groupby(['Class'])['Force'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)

ci95_hi = []
ci95_lo = []

for i in stats.index:
    m, c, s = stats.loc[i]
    ci95_hi.append(m + 1.96*s/math.sqrt(c))
    ci95_lo.append(m - 1.96*s/math.sqrt(c))

stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)

输出为

  Class  Force
0    A1     50
1    A1    150
2    A1    100
3    A2    120
4    A3    140
5    A3    160
------------------------------
       mean  count        std
Class                        
A1      100      3  50.000000
A2      120      1        NaN
A3      150      2  14.142136
------------------------------
       mean  count        std     ci95_hi     ci95_lo
Class                                                
A1      100      3  50.000000  156.580326   43.419674
A2      120      1        NaN         NaN         NaN
A3      150      2  14.142136  169.600000  130.400000

这篇关于Python数据框的置信区间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆