比较两个DataFrame,具体问题 [英] comparing two DataFrames, specific questions

查看:318
本文介绍了比较两个DataFrame,具体问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读安迪对问题解决方案

问题1

ne_stackedpd.Series,由TrueFalse值组成,这些值指示df1df2不相等的地方.

ne_stacked[boolean_array]是通过消除ne_stacked其中boolean_arrayFalse的行并保留ne_stacked其中boolean_arrayTrue的行来过滤系列ne_stacked的一种方法. /p>

ne_stacked也是布尔数组,因此可以用来过滤自身.为什么要这样做?这样我们就可以查看过滤后的索引值.

所以ne_stacked[ne_stacked]ne_stacked的子集,只有True值.

问题2

np.where

np.where做两件事,如果只传递类似np.where(df1 != df2)的条件,则会得到一个tuple数组,其中第一个是要与的第二个元素结合使用的所有行索引的引用tuple是对所有列索引的引用.我通常这样使用

i, j = np.where(df1 != df2)

现在,我可以了解df1df2的所有元素,其中存在类似的差异

df.values[i, j]

或者我可以分配给那些单元格

df.values[i, j] = -99

或者其他很多有用的东西.

您也可以将np.where用作if,然后将else用作数组

np.where(df1 != df2, -99, 99)

要生成与df1df2相同大小的数组,在其余所有df1 != df299的地方都具有-99.

df.where

另一方面,df.where评估布尔值的第一个参数,并返回与df大小相等的对象,其中保留评估为True的单元格,其余为np.nan或值传入df.where

的第二个参数

df1.where(df1 != df2)

df1.where(df1 != df2, -99)

相同吗?
显然,它们不是相同"的.但是您可以类似地使用它们

np.where(df1 != df2, df1, -99)

应与

相同

df1.where(df1 != df2, -99).values

I was read Andy's answer to the question Outputting difference in two Pandas dataframes side by side - highlighting the difference

i have two questions regarding the code, unfortunately I dont yet have 50 rep to comment on the answer so I hope i could get some help here.

  1. what does In [24]: changed = ne_stacked[ne_stacked] do? I'm not sure what df1 = df[df] do and i cant seem to get an answer from pandas doc, could someone explain this to me please?

  2. is np.where(df1 != df2) the same as pd.df.where(df1 != df2). If no, what is the difference?

解决方案

Question 1

ne_stacked is a pd.Series that consists of True and False values that indicate where df1 and df2 are not equal.

ne_stacked[boolean_array] is a way to filter the series ne_stacked by eliminating the rows of ne_stacked where boolean_array is False and keeping the rows of ne_stacked where boolean_array is True.

It so happens that ne_stacked is also a boolean array and so can be used to filter itself. Why would be want to do this? So we can see what the values of the index are after we've filtered.

So ne_stacked[ne_stacked] is a subset of ne_stacked with only True values.

Question 2

np.where

np.where does two things, if you only pass a conditional like in np.where(df1 != df2), you get a tuple of arrays where the first is a reference of all row indices to be used in conjunction with the second element of the tuple that is a reference to all column indices. I usually use it like this

i, j = np.where(df1 != df2)

Now I can get at all elements of df1 or df2 in which there are differences like

df.values[i, j]

Or I can assign to those cells

df.values[i, j] = -99

Or lots of other useful things.

You can also use np.where as an if, then, else for arrays

np.where(df1 != df2, -99, 99)

To produce an array the same size as df1 or df2 where you have -99 in all the places where df1 != df2 and 99 in the rest.

df.where

On the other hand df.where evaluates the first argument of boolean values and returns an object of equal size to df where the cells that evaluated to True are kept and the rest are either np.nan or the values passed in the second argument of df.where

df1.where(df1 != df2)

Or

df1.where(df1 != df2, -99)

are they the same?
Clearly they are not the "same". But you can use them similarly

np.where(df1 != df2, df1, -99)

Should be the same as

df1.where(df1 != df2, -99).values

这篇关于比较两个DataFrame,具体问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆