如何使用Python Pandas执行三个变量相关 [英] How to perform three variable correlation with Python Pandas
问题描述
Pandas corr()
函数将其限制用于成对计算。但是,如何使用薪水作为下面数据框中的因变量来计算数据框中三个变量的相关性?
Pandas corr()
function limits its use to pairwise calculation. But how do you calculate the correlation of three variables in a data frame using salary as the dependent variable in the data frame below?
GPA IQ SALARY
0 3.2 100 45000
1 4.0 140 150000
2 2.9 90 30000
3 2.5 85 25000
4 3.6 120 75000
5 3.4 110 60000
6 3.0 05 38000
推荐答案
您可以计算通过首先获取与熊猫对的相关系数来获得具有两个其他自变量的因变量。然后,您可以使用多重相关系数函数来计算R平方,但是它有一些偏差,因此您可以选择更准确的调整R平方值。您还可以调整公式以考虑更多独立变量。以下是Charles Zaiontz先生的一篇出色文章的python改编。 http://www.real-statistics.com/correlation/multiple-correlation/
You can calculate the correlation of a dependent variable with two other independent variables by first getting the correlation coefficients of the pairs with pandas. Then you can use a multiple correlation coefficient function to calculate the R-squared, this however is slightly biased, so you may opt for the more accurate adjusted R-squared value. You can also adjust the equation to take into account more independent variables. The following is a python adaptation of an excellent article by Mr. Charles Zaiontz. http://www.real-statistics.com/correlation/multiple-correlation/
import math
df = pd.DataFrame({
'IQ':[100,140,90,85,120,110,95],
'GPA':[3.2,4.0,2.9,2.5,3.6,3.4,3.0],
'SALARY':[45e3,150e3,30e3,25e3,75e3,60e3,38e3]
})
# Get pairwise correlation coefficients
cor = df.corr()
# Independent variables
x = 'IQ'
y = 'GPA'
# Dependent variable
z = 'SALARY'
# Pairings
xz = cor.loc[ x, z ]
yz = cor.loc[ y, z ]
xy = cor.loc[ x, y ]
Rxyz = math.sqrt((abs(xz**2) + abs(yz**2) - 2*xz*yz*xy) / (1-abs(xy**2)) )
R2 = Rxyz**2
# Calculate adjusted R-squared
n = len(df) # Number of rows
k = 2 # Number of independent variables
R2_adj = 1 - ( ((1-R2)*(n-1)) / (n-k-1) )
R2,R2_adj = 0 .958,0.956
R2,R2_adj = 0.958, 0.956
结果显示,工资中几乎有96%取决于智商和GPA或与之相关。
Results show that salary is almost 96% dependent on/correlated with IQ and GPA.
这篇关于如何使用Python Pandas执行三个变量相关的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!