忽略groupby组中NaN的标准错误 [英] Standard error ignoring NaN in pandas groupby groups

查看:248
本文介绍了忽略groupby组中NaN的标准错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将数据加载到一个数据列中,该列具有针对列标题的多个索引.目前,我一直在按列索引对数据进行分组,以获取各组的平均值,并计算出95%的置信区间,如下所示:

I have data loaded into a dataframe with that has a multi index for the columns headers. Currently I've been grouping the data by the columns indices to take the mean of the groups and calculate the 95% confidence intervals like this:

from pandas import *
import pandas as pd
from scipy import stats as st

#Normalize to starting point then convert
normalized = (data - data.ix[0]) * 11.11111
#Group normalized data based on slope and orientation
grouped = normalized.groupby(level=['SLOPE','DEPTH'], axis=1)
#Obtain mean of each group
means = grouped.mean()
#Calculate 95% confidence interval for each group
ci = grouped.aggregate(lambda x: st.sem(x) * 1.96)

但是,这样做的问题是,如果组中存在NaN,则scipy函数st.sem返回NaN时,在组上使用的均值函数会忽略NaN值.我需要像均值函数那样计算NaN时计算标准误差.

but the problem with this is that the mean function that is used on the groups ignores NaN values while while the scipy function st.sem returns NaN if there is an NaN in the group. I need to calculate the standard error while ignoring NaNs as the mean function does.

我已经尝试过像这样计算95%的置信区间:

I've tried going about calculating the 95% confidence interval like this:

#Calculate 95% confidence interval for each group
ci = grouped.aggregate(lambda x: np.std(x) / ??? * 1.96)

numpy中的std会给我忽略NaN值的标准偏差,但是我需要将其除以忽略NaN的组大小的平方根,以获得标准误差.

std in numpy will give me the standard deviation ignoring NaN values but I need to divide this by the square root of the group size ignoring NaNs in order to get the standard error.

在忽略NaN的同时计算标准误差的最简单方法是什么?

What is the easiest way to calculate the standard error while ignoring NaNs?

推荐答案

count()对象的Series方法将不返回NaN值计数:

count() method of Series object will return no NaN value count:

import pandas as pd
s = pd.Series([1,2,np.nan, 3])
print s.count()

输出:

3

因此,尝试:

ci = grouped.aggregate(lambda x: np.std(x) / x.count() * 1.96)

这篇关于忽略groupby组中NaN的标准错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆