比较两个DataFrame,具体问题 [英] comparing two DataFrames, specific questions
问题1
ne_stacked
是pd.Series
,由True
和False
值组成,这些值指示df1
和df2
不相等的地方.
ne_stacked[boolean_array]
是通过消除ne_stacked
其中boolean_array
是False
的行并保留ne_stacked
其中boolean_array
是True
的行来过滤系列ne_stacked
的一种方法. /p>
ne_stacked
也是布尔数组,因此可以用来过滤自身.为什么要这样做?这样我们就可以查看过滤后的索引值.
所以ne_stacked[ne_stacked]
是ne_stacked
的子集,只有True
值.
问题2
np.where
np.where
做两件事,如果只传递类似np.where(df1 != df2)
的条件,则会得到一个tuple
数组,其中第一个是要与的第二个元素结合使用的所有行索引的引用tuple
是对所有列索引的引用.我通常这样使用
i, j = np.where(df1 != df2)
现在,我可以了解df1
或df2
的所有元素,其中存在类似的差异
df.values[i, j]
或者我可以分配给那些单元格
df.values[i, j] = -99
或者其他很多有用的东西.
您也可以将np.where
用作if,然后将else用作数组
np.where(df1 != df2, -99, 99)
要生成与df1
或df2
相同大小的数组,在其余所有df1 != df2
和99
的地方都具有-99
.
df.where
另一方面,df.where
评估布尔值的第一个参数,并返回与df
大小相等的对象,其中保留评估为True
的单元格,其余为np.nan
或值传入df.where
df1.where(df1 != df2)
或
df1.where(df1 != df2, -99)
相同吗?
显然,它们不是相同"的.但是您可以类似地使用它们
np.where(df1 != df2, df1, -99)
应与
相同df1.where(df1 != df2, -99).values
I was read Andy's answer to the question Outputting difference in two Pandas dataframes side by side - highlighting the difference
i have two questions regarding the code, unfortunately I dont yet have 50 rep to comment on the answer so I hope i could get some help here.
what does
In [24]: changed = ne_stacked[ne_stacked]
do? I'm not sure what df1 = df[df] do and i cant seem to get an answer from pandas doc, could someone explain this to me please?is
np.where(df1 != df2)
the same aspd.df.where(df1 != df2)
. If no, what is the difference?
Question 1
ne_stacked
is a pd.Series
that consists of True
and False
values that indicate where df1
and df2
are not equal.
ne_stacked[boolean_array]
is a way to filter the series ne_stacked
by eliminating the rows of ne_stacked
where boolean_array
is False
and keeping the rows of ne_stacked
where boolean_array
is True
.
It so happens that ne_stacked
is also a boolean array and so can be used to filter itself. Why would be want to do this? So we can see what the values of the index are after we've filtered.
So ne_stacked[ne_stacked]
is a subset of ne_stacked
with only True
values.
Question 2
np.where
np.where
does two things, if you only pass a conditional like in np.where(df1 != df2)
, you get a tuple
of arrays where the first is a reference of all row indices to be used in conjunction with the second element of the tuple
that is a reference to all column indices. I usually use it like this
i, j = np.where(df1 != df2)
Now I can get at all elements of df1
or df2
in which there are differences like
df.values[i, j]
Or I can assign to those cells
df.values[i, j] = -99
Or lots of other useful things.
You can also use np.where
as an if, then, else for arrays
np.where(df1 != df2, -99, 99)
To produce an array the same size as df1
or df2
where you have -99
in all the places where df1 != df2
and 99
in the rest.
df.where
On the other hand df.where
evaluates the first argument of boolean values and returns an object of equal size to df
where the cells that evaluated to True
are kept and the rest are either np.nan
or the values passed in the second argument of df.where
df1.where(df1 != df2)
Or
df1.where(df1 != df2, -99)
are they the same?
Clearly they are not the "same". But you can use them similarly
np.where(df1 != df2, df1, -99)
Should be the same as
df1.where(df1 != df2, -99).values
这篇关于比较两个DataFrame,具体问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!