使用.corr获取两列之间的相关性 [英] Use .corr to get the correlation between two columns

查看:868
本文介绍了使用.corr获取两列之间的相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下熊猫数据框Top15:

I have the following pandas dataframe Top15:

我创建一列来估算每人可引用的文献数量:

I create a column that estimates the number of citable documents per person:

Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']

我想知道人均可引用文件数量与人均能源供应之间的相关性.因此,我使用.corr()方法(皮尔森相关性):

I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the .corr() method (Pearson's correlation):

data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')

我想返回一个数字,但是结果是:

I want to return a single number, but the result is:

推荐答案

没有实际数据,很难回答这个问题,但是我想您正在寻找这样的东西:

Without actual data it is hard to answer the question but I guess you are looking for something like this:

Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])

这将计算两列之间的相关性 'Citable docs per Capita''Energy Supply per Capita'.

举个例子:

import pandas as pd

df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})

   A  B
0  0  0
1  1  2
2  2  4
3  3  6

然后

df['A'].corr(df['B'])

按预期给出了1.

现在,如果您更改值,例如

Now, if you change a value, e.g.

df.loc[2, 'B'] = 4.5

   A    B
0  0  0.0
1  1  2.0
2  2  4.5
3  3  6.0

命令

df['A'].corr(df['B'])

返回

0.99586

它仍然接近1.

如果直接将.corr应用于数据框,请

If you apply .corr directly to your dataframe, it will return all pairwise correlations between your columns; that's why you then observe 1s at the diagonal of your matrix (each column is perfectly correlated with itself).

df.corr()

因此将返回

          A         B
A  1.000000  0.995862
B  0.995862  1.000000

在您显示的图形中,仅表示相关矩阵的左上角(我假设).

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

在某些情况下,您的解决方案中会出现NaN-请查看这篇文章作为示例.

There can be cases, where you get NaNs in your solution - check this post for an example.

如果您要过滤高于或低于特定阈值的条目,则可以检查此问题. 如果要绘制相关系数的热图,可以检查.

If you want to filter entries above/below a certain threshold, you can check this question. If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.

这篇关于使用.corr获取两列之间的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆