如何使用Python Pandas执行三个变量相关 [英] How to perform three variable correlation with Python Pandas

查看:82
本文介绍了如何使用Python Pandas执行三个变量相关的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Pandas corr()函数将其限制用于成对计算。但是,如何使用薪水作为下面数据框中的因变量来计算数据框中三个变量的相关性?

Pandas corr() function limits its use to pairwise calculation. But how do you calculate the correlation of three variables in a data frame using salary as the dependent variable in the data frame below?

    GPA    IQ    SALARY
0   3.2    100   45000
1   4.0    140  150000
2   2.9    90    30000
3   2.5    85    25000
4   3.6    120   75000
5   3.4    110   60000
6   3.0    05    38000


推荐答案

您可以计算通过首先获取与熊猫对的相关系数来获得具有两个其他自变量的因变量。然后,您可以使用多重相关系数函数来计算R平方,但是它有一些偏差,因此您可以选择更准确的调整R平方值。您还可以调整公式以考虑更多独立变量。以下是Charles Zaiontz先生的一篇出色文章的python改编。 http://www.real-statistics.com/correlation/multiple-correlation/

You can calculate the correlation of a dependent variable with two other independent variables by first getting the correlation coefficients of the pairs with pandas. Then you can use a multiple correlation coefficient function to calculate the R-squared, this however is slightly biased, so you may opt for the more accurate adjusted R-squared value. You can also adjust the equation to take into account more independent variables. The following is a python adaptation of an excellent article by Mr. Charles Zaiontz. http://www.real-statistics.com/correlation/multiple-correlation/

import math

df = pd.DataFrame({
    'IQ':[100,140,90,85,120,110,95], 
    'GPA':[3.2,4.0,2.9,2.5,3.6,3.4,3.0],
    'SALARY':[45e3,150e3,30e3,25e3,75e3,60e3,38e3]
    })

# Get pairwise correlation coefficients
cor = df.corr()

# Independent variables
x = 'IQ'
y = 'GPA'

# Dependent variable
z = 'SALARY'

# Pairings
xz = cor.loc[ x, z ]
yz = cor.loc[ y, z ]
xy = cor.loc[ x, y ]

Rxyz = math.sqrt((abs(xz**2) + abs(yz**2) - 2*xz*yz*xy) / (1-abs(xy**2)) )
R2 = Rxyz**2

# Calculate adjusted R-squared
n = len(df) # Number of rows
k = 2       # Number of independent variables
R2_adj = 1 - ( ((1-R2)*(n-1)) / (n-k-1) )

R2,R2_adj = 0 .958,0.956

R2,R2_adj = 0.958, 0.956

结果显示,工资中几乎有96%取决于智商和GPA或与之相关。

Results show that salary is almost 96% dependent on/correlated with IQ and GPA.

这篇关于如何使用Python Pandas执行三个变量相关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆