Scipy 中具有 NaN 值的 T 检验 [英] T-Test in Scipy with NaN values
问题描述
我在 scipy 中进行 t 检验时遇到问题,这让我慢慢发疯.它应该很容易解决,但我所做的一切都不起作用,而且我无法通过广泛的搜索找到解决方案.我在最新的 Anaconda 发行版上使用 Spyder.
I have a problem with doing a t-test in scipy that's driving me slowly crazy. It should be simple to resolve, but nothing I do works and there's no solution I can find through extensive searching. I'm using Spyder on the latest distribution of Anaconda.
特别是:我想在从 csv 文件导入的 Pandas 数据框中比较两列之间的均值——Trait_A"和Trait_B".其中一列中的某些值是Nan"(非数字").独立样本 scipy t-test 函数的默认设置不适应NaN"值.但是,将 'nan_policy' 参数设置为'omit' 应该处理这个.尽管如此,当我这样做时,测试统计量和 p 值返回为NaN".当我将涵盖的值范围限制为实际数字时,测试工作正常.我的数据和代码如下;谁能建议我做错了什么?谢谢!
Specifically: I want to compare means between two columns––'Trait_A' and 'Trait_B'––in a pandas dataframe that I've imported from a csv file. Some of the values in one of the columns are 'Nan' ('Not a Number'). The default setting on the independent samples scipy t-test function doesn't accommodate 'NaN' values. However, setting the 'nan_policy' parameter to 'omit' should deal with this. Nevertheless, when I do, the test statistic and p value come back as 'NaN.' When I restrict the range of values covered to actual numbers, the test works fine. My data and code are below; can anyone suggest what I'm doing wrong? Thanks!
数据:
Trait_A Trait_B
0 1.714286 0.000000
1 4.275862 4.000000
2 0.500000 4.625000
3 1.000000 0.000000
4 1.000000 4.000000
5 1.142857 1.000000
6 2.000000 1.000000
7 9.416667 1.956522
8 2.052632 0.571429
9 2.100000 0.166667
10 0.666667 0.000000
11 2.333333 1.705882
12 2.768145 NaN
13 0.000000 NaN
14 6.333333 NaN
15 0.928571 NaN
我的代码:
import pandas as pd
import scipy.stats as sp
data= pd.read_csv("filepath/Data2.csv")
print (sp.stats.ttest_ind(data['Trait_A'], data['Trait_B'], nan_policy='omit'))
我的结果:
Ttest_indResult(statistic=nan, pvalue=nan)
推荐答案
这似乎是一个错误.您可以在将 nan
传递给 t-test 之前删除它们:
It seems like a bug.
You can drop nan
s before passing them to the t-test:
sp.stats.ttest_ind(data.dropna()['Trait_A'], data.dropna()['Trait_B'])
Ttest_indResult(statistic=0.88752464718609214, pvalue=0.38439692093551037)
这篇关于Scipy 中具有 NaN 值的 T 检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!