Scipy 中具有 NaN 值的 T 检验 [英] T-Test in Scipy with NaN values

查看:68
本文介绍了Scipy 中具有 NaN 值的 T 检验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 scipy 中进行 t 检验时遇到问题,这让我慢慢发疯.它应该很容易解决,但我所做的一切都不起作用,而且我无法通过广泛的搜索找到解决方案.我在最新的 Anaconda 发行版上使用 Spyder.

I have a problem with doing a t-test in scipy that's driving me slowly crazy. It should be simple to resolve, but nothing I do works and there's no solution I can find through extensive searching. I'm using Spyder on the latest distribution of Anaconda.

特别是:我想在从 csv 文件导入的 Pandas 数据框中比较两列之间的均值——Trait_A"和Trait_B".其中一列中的某些值是Nan"(非数字").独立样本 scipy t-test 函数的默认设置不适应NaN"值.但是,将 'nan_policy' 参数设置为'omit' 应该处理这个.尽管如此,当我这样做时,测试统计量和 p 值返回为NaN".当我将涵盖的值范围限制为实际数字时,测试工作正常.我的数据和代码如下;谁能建议我做错了什么?谢谢!

Specifically: I want to compare means between two columns––'Trait_A' and 'Trait_B'––in a pandas dataframe that I've imported from a csv file. Some of the values in one of the columns are 'Nan' ('Not a Number'). The default setting on the independent samples scipy t-test function doesn't accommodate 'NaN' values. However, setting the 'nan_policy' parameter to 'omit' should deal with this. Nevertheless, when I do, the test statistic and p value come back as 'NaN.' When I restrict the range of values covered to actual numbers, the test works fine. My data and code are below; can anyone suggest what I'm doing wrong? Thanks!

数据:

     Trait_A   Trait_B
0   1.714286  0.000000
1   4.275862  4.000000
2   0.500000  4.625000
3   1.000000  0.000000
4   1.000000  4.000000
5   1.142857  1.000000
6   2.000000  1.000000
7   9.416667  1.956522
8   2.052632  0.571429
9   2.100000  0.166667
10  0.666667  0.000000
11  2.333333  1.705882
12  2.768145       NaN
13  0.000000       NaN
14  6.333333       NaN
15  0.928571       NaN

我的代码:

import pandas as pd
import scipy.stats as sp
data= pd.read_csv("filepath/Data2.csv")
print (sp.stats.ttest_ind(data['Trait_A'], data['Trait_B'], nan_policy='omit'))      

我的结果:

Ttest_indResult(statistic=nan, pvalue=nan)

推荐答案

这似乎是一个错误.您可以在将 nan 传递给 t-test 之前删除它们:

It seems like a bug. You can drop nans before passing them to the t-test:

sp.stats.ttest_ind(data.dropna()['Trait_A'], data.dropna()['Trait_B'])
Ttest_indResult(statistic=0.88752464718609214, pvalue=0.38439692093551037)

这篇关于Scipy 中具有 NaN 值的 T 检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆