pandas.DataFrame corrwith（）方法 [英] pandas.DataFrame corrwith() method

查看：5050 发布时间：2017/3/26 3:07:21 python pandas dataframe

本文介绍了pandas.DataFrame corrwith（）方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近开始使用 pandas 。任何人都可以解释我在 .corrwith（）与系列和 DataFrame ？

假设我有一个 DataFrame ：

  frame = pd.DataFrame（data = {'a'：[1,2,3]，'b'：[ -  1，-2，-3] ，'c'：[10，-10，10]}）

我想要计算功能a和所有其他功能。
我可以通过以下方式执行：

  frame.drop（labels ='a'，axis = 1 ）.corrwith（frame ['a']）

结果将是：

  b -1.0 
c 0.0

但非常相似的代码：

  frame.drop（labels ='a'，axis = 1）。 corrwith（frame [['a']]）

生成绝对不同且不可接受的表：

  a NaN 
b NaN 
c NaN

所以，我的问题是：为什么在 DataFrame 作为第二个参数，我们得到这样奇怪的输出？

解决方案

我想你在找什么：

让我们说你的框架是：

  frame = pd.DataFrame（np.random.rand（10，6），columns = ['cost'，'amount '，'day'，'month'，'is_sale'，'hour']）

t '$'$ / code>'amount'列与每个组合中的所有其他列相关。

  focus_cols = ['cost'，'amount'] 
 frame.corr（）。filter（focus_cols）.drop（focus_cols）

回答你的问题：

计算两个DataFrame对象的行或列之间的对b $ b相关性。 / p>

参数：

其他：DataFrame

axis：{0 or'index'，1 or'columns'}，

default 0 0或'index' 1或列用于逐行下拉：布尔值，默认值False从
结果中丢弃索引，默认返回所有的结合返回：correls：Seri es

corrwith 的行为类似于 add ， sub ， mul ， div 其他中找到 DataFrame 或系列 code>尽管文件说只是 DataFrame 。

当其他是一个系列，它广播该系列并沿着由轴指定的轴进行匹配，默认为0这就是为什么以下工作：

  frame.drop（labels ='a'，axis = 1）.corrwith .a）
 
b -1.0 
c 0.0 
 dtype：float64

当其他是 DataFrame 时，它将与轴并关联由另一个轴标识的每对。如果我们这样做：

  frame.drop（'a'，axis = 1）.corrwith（frame.drop（'b' ，axis = 1））
 
a NaN 
b NaN 
c 1.0 
 dtype：float64

只有 c 是共同的，只有 c 有相关性计算。

如果您指定：

  frame.drop（labels ='a'，轴= 1）.corrwith（frame [['a']]）

frame [['a']] 是一个 DataFrame ，因为 [['a ']] ，现在由$ code> DataFrame 在其列必须与其相关联的列中进行匹配。但是您从第一帧中明确删除 a ，然后与 DataFrame 相关联，只有 a 。结果是每列的 NaN 。

I recently start working with pandas. Can anyone explain me difference in behaviour of function .corrwith() with Series and DataFrame?

Suppose i have one DataFrame:

frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})

And i want calculate correlation between features 'a' and all other features. I can do it in the following way:

frame.drop(labels='a', axis=1).corrwith(frame['a'])

And result will be:

b   -1.0
c    0.0

But very similar code:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

Generate absolutely different and unacceptable table:

a   NaN
b   NaN
c   NaN

So, my question is: why in case of DataFrame as second argument we get such strange output?

解决方案

What I think you're looking for:

Let's say your frame is:

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

You want the 'cost' and 'amount' columns to be correlated with all other columns in every combination.

focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)

Answering what you asked:

Compute pairwise correlation between rows or columns of two DataFrame objects.

Parameters:

other : DataFrame

axis : {0 or ‘index’, 1 or ‘columns’},

default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series

corrwith is behaving similarly to add, sub, mul, div in that it expects to find a DataFrame or a Series being passed in other despite the documentation saying just DataFrame.

When other is a Series it broadcast that series and matches along the axis specified by axis, default is 0. This is why the following worked:

frame.drop(labels='a', axis=1).corrwith(frame.a)

b   -1.0
c    0.0
dtype: float64

When other is a DataFrame it will match the axis specified by axis and correlate each pair identified by the other axis. If we did:

frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))

a    NaN
b    NaN
c    1.0
dtype: float64

Only c was in common and only c had its correlation calculated.

In the case you specified:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

frame[['a']] is a DataFrame because of the [['a']] and now plays by the DataFrame rules in which its columns must match up with what its being correlated with. But you explicitly drop a from the first frame then correlate with a DataFrame with nothing but a. The result is NaN for every column.

这篇关于pandas.DataFrame corrwith（）方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas.DataFrame corrwith（）方法 [英] pandas.DataFrame corrwith() method

问题描述

我想你在找什么：

回答你的问题：

What I think you're looking for:

Answering what you asked:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas.DataFrame corrwith（）方法 [英] pandas.DataFrame corrwith() method

问题描述

我想你在找什么：

回答你的问题：

What I think you're looking for:

Answering what you asked:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭