pandas.DataFrame corrwith()方法 [英] pandas.DataFrame corrwith() method
问题描述
我最近开始使用 pandas
。任何人都可以解释我在 .corrwith()
与系列
和 DataFrame
?
假设我有一个 DataFrame
:
frame = pd.DataFrame(data = {'a':[1,2,3],'b':[ - 1,-2,-3] ,'c':[10,-10,10]})
我想要计算功能a和所有其他功能。
我可以通过以下方式执行:
frame.drop(labels ='a',axis = 1 ).corrwith(frame ['a'])
结果将是:
b -1.0
c 0.0
但非常相似的代码:
frame.drop(labels ='a',axis = 1)。 corrwith(frame [['a']])
生成绝对不同且不可接受的表:
a NaN
b NaN
c NaN
所以,我的问题是:为什么在 DataFrame
作为第二个参数,我们得到这样奇怪的输出?
我想你在找什么:
让我们说你的框架是:
frame = pd.DataFrame(np.random.rand(10,6),columns = ['cost','amount ','day','month','is_sale','hour'])
t '$'$ / code>'amount'列与每个组合中的所有其他列相关。
focus_cols = ['cost','amount']
frame.corr()。filter(focus_cols).drop(focus_cols)
回答你的问题:
计算两个DataFrame对象的行或列之间的对b $ b相关性。 / p>
参数:
其他:DataFrame
axis:{0 or'index',1 or'columns'},
default 0 0或'index' 1或列用于逐行下拉:布尔值,默认值False从
结果中丢弃索引,默认返回所有的结合返回:correls:Seri es
corrwith
的行为类似于 add
, sub
, mul
, div
其他中找到 DataFrame
或系列
code>尽管文件说只是 DataFrame
。
当其他
是一个系列
,它广播该系列并沿着由轴
指定的轴进行匹配,默认为0这就是为什么以下工作:
frame.drop(labels ='a',axis = 1).corrwith .a)
b -1.0
c 0.0
dtype:float64
当其他
是 DataFrame
时,它将与轴
并关联由另一个轴标识的每对。如果我们这样做:
frame.drop('a',axis = 1).corrwith(frame.drop('b' ,axis = 1))
a NaN
b NaN
c 1.0
dtype:float64
只有 c
是共同的,只有 c
有相关性计算。
如果您指定:
frame.drop(labels ='a',轴= 1).corrwith(frame [['a']])
frame [['a']]
是一个 DataFrame
,因为 [['a ']]
,现在由$ code> DataFrame 在其列必须与其相关联的列中进行匹配。但是您从第一帧中明确删除 a
,然后与 DataFrame
相关联,只有 a
。结果是每列的 NaN
。
I recently start working with pandas
. Can anyone explain me difference in behaviour of function .corrwith()
with Series
and DataFrame
?
Suppose i have one DataFrame
:
frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})
And i want calculate correlation between features 'a' and all other features. I can do it in the following way:
frame.drop(labels='a', axis=1).corrwith(frame['a'])
And result will be:
b -1.0
c 0.0
But very similar code:
frame.drop(labels='a', axis=1).corrwith(frame[['a']])
Generate absolutely different and unacceptable table:
a NaN
b NaN
c NaN
So, my question is: why in case of DataFrame
as second argument we get such strange output?
What I think you're looking for:
Let's say your frame is:
frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])
You want the 'cost'
and 'amount'
columns to be correlated with all other columns in every combination.
focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)
Answering what you asked:
Compute pairwise correlation between rows or columns of two DataFrame objects.
Parameters:
other : DataFrame
axis : {0 or ‘index’, 1 or ‘columns’},
default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series
corrwith
is behaving similarly to add
, sub
, mul
, div
in that it expects to find a DataFrame
or a Series
being passed in other
despite the documentation saying just DataFrame
.
When other
is a Series
it broadcast that series and matches along the axis specified by axis
, default is 0. This is why the following worked:
frame.drop(labels='a', axis=1).corrwith(frame.a)
b -1.0
c 0.0
dtype: float64
When other
is a DataFrame
it will match the axis specified by axis
and correlate each pair identified by the other axis. If we did:
frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))
a NaN
b NaN
c 1.0
dtype: float64
Only c
was in common and only c
had its correlation calculated.
In the case you specified:
frame.drop(labels='a', axis=1).corrwith(frame[['a']])
frame[['a']]
is a DataFrame
because of the [['a']]
and now plays by the DataFrame
rules in which its columns must match up with what its being correlated with. But you explicitly drop a
from the first frame then correlate with a DataFrame
with nothing but a
. The result is NaN
for every column.
这篇关于pandas.DataFrame corrwith()方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!