Python：分类数据的排名顺序相关性 [英] Python: Rank order correlation for categorical data

查看：73 发布时间：2020/10/10 1:41:04 python statistics scipy correlation

本文介绍了Python：分类数据的排名顺序相关性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是编程和统计学的新手，所以如果它在形式上不正确，请帮助我改善这个问题。

I am somewhat new to programming and statistics, so please help me improve this question if it is formally not correct.

我有很多参数，还有几个我在MonteCarlo仿真中生成的结果向量的集合。现在，我想测试每个参数对结果的影响。我已经有一个脚本与Kendall的Tau合作。现在，我想与Spearman和Pearson rho进行比较。例如：

I have a lot of parameters and a couple of result vectors I produced in a MonteCarlo simulation. Now I want to test the influence of each parameter for the result. I already got a script working with Kendall's Tau. Now I would like to compare with Spearman and Pearson rho. An example:

from scipy.stats import spearmanr, kendalltau, pearsonr
result = [106, 86, 100, 101, 99, 103, 97, 113, 112, 110]
parameter = ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']
kendalltau(parameter, result)

>> (0.14907119849998596, 0.54850624613917143)

但是，如果我为 spearmanr尝试相同的操作或 pearsonr 我遇到了错误。显然，此功能未在Scipy中实现。您知道获得分类数据相关系数的简单方法吗？

However if I try the same for spearmanr or pearsonr I get errors. Apparently this feature was not implemented in Scipy. Do you know of a simple way to obtain correlation coefficients for categorical data?

推荐答案

实际上spearmanr可以工作，但是pearsonr不会这样做需要计算数组的平均值， dtype 对于字符串不正确。见下文：

Actually spearmanr works, however pearsonr will not as it needs to calculate the mean of the array, dtype is not correct for string. See below:

from scipy.stats import spearmanr, kendalltau, pearsonr

result = [106, 86, 100, 101, 99, 103, 97, 113, 112, 110]

parameter = ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']

spearmanr(result, parameter)

（0.1740776559556978978，0.63053607555697644）

(0.17407765595569782, 0.63053607555697644)

help(pearsonr)

Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)
    Calculates a Pearson correlation coefficient and the p-value for testing
    non-correlation.

    The Pearson correlation coefficient measures the linear relationship
    between two datasets. Strictly speaking, Pearson's correlation requires
    that each dataset be normally distributed. Like other correlation
    coefficients, this one varies between -1 and +1 with 0 implying no
    correlation. Correlations of -1 or +1 imply an exact linear
    relationship. Positive correlations imply that as x increases, so does
    y. Negative correlations imply that as x increases, y decreases.

    The p-value roughly indicates the probability of an uncorrelated system
    producing datasets that have a Pearson correlation at least as extreme
    as the one computed from these datasets. The p-values are not entirely
    reliable but are probably reasonable for datasets larger than 500 or so.

    Parameters
    ----------
    x : 1D array
    y : 1D array the same length as x

    Returns
    -------
    (Pearson's correlation coefficient,
     2-tailed p-value)

    References
    ----------
    http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation

将'A'转换为1，'B'转换为2，例如

convert 'A' to 1, 'B' to 2, for example

params = [1 if el == 'A' else 2 for el in parameter]

print params

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

pearsonr(params, result)

(-0.012995783552244984, 0.97157652425566488)

希望有帮助。

这篇关于Python：分类数据的排名顺序相关性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python：分类数据的排名顺序相关性 [英] Python: Rank order correlation for categorical data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python：分类数据的排名顺序相关性 [英] Python: Rank order correlation for categorical data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭