大 pandas DataFrame combine_first和更新方法有奇怪的行为 [英] pandas DataFrame combine_first and update methods have strange behavior
问题描述
我遇到一个奇怪的问题(或意图?),其中 combine_first
或 update
正在导致存储值如果提供的参数不提供布尔列,则 bool
将被上传到 float64
。
ipython中的示例工作流程:
在[144]中:test = pd.DataFrame [[1,2,False,True],[4,5,True,False]],columns = ['a','b','isBool','isBool2'])
/ pre>
在[145]中:test
Out [145]:
ab isBool isBool2
0 1 2 False True
1 4 5 True False
在[147]中:b = pd.DataFrame([[45,45]],index = [0],columns = ['a','b'])
在[148] :b
Out [148]:
ab
0 45 45
在[149]中:test.update(b)
[150]:test
Out [150]:
ab isBool isBool2
0 45 45 0 1
1 4 5 1 0
这是否意味着成为
upd的行为ate
函数?我会认为,如果没有指定更新
不会混淆其他列。
编辑:我开始修改了一点。剧情增厚。如果我再插入一个命令:
test.update([])
运行test.update(b)
,boolean行为的成本是以对象
为例。这也适用于DSM的简化示例。
根据熊猫的源代码,它看起来像reindex_like方法正在创建一个dtype
对象
的DataFrame,而reindex_likeb
创建一个dtypefloat64
的DataFrame。由于对象
更为通用,随后的操作与bools一起工作。不幸的是,在数值列上运行np.log
将失败,并带有一个AttributeError
。解决方案这是一个错误,更新不应该触摸未指定的列,在这里修复 https://github.com/pydata/pandas/pull/3021
I'm running into a strange issue (or intended?) where
combine_first
orupdate
are causing values stored asbool
to be upcasted intofloat64
s if the argument supplied is not supplying the boolean columns.Example workflow in ipython:
In [144]: test = pd.DataFrame([[1,2,False,True],[4,5,True,False]], columns=['a','b','isBool', 'isBool2']) In [145]: test Out[145]: a b isBool isBool2 0 1 2 False True 1 4 5 True False In [147]: b = pd.DataFrame([[45,45]], index=[0], columns=['a','b']) In [148]: b Out[148]: a b 0 45 45 In [149]: test.update(b) In [150]: test Out[150]: a b isBool isBool2 0 45 45 0 1 1 4 5 1 0
Was this meant to be the behavior of the
update
function? I would think that if nothing was specified thatupdate
wouldn't mess with the other columns.
EDIT: I started tinkering around a little more. The plot thickens. If I insert one more command:
test.update([])
before runningtest.update(b)
, boolean behavior works at the cost of numbers upcasted asobjects
. This also applies to DSM's simplified example.Based on panda's source code, it looks like the reindex_like method is creating a DataFrame of dtype
object
, while reindex_likeb
creates a DataFrame of dtypefloat64
. Sinceobject
is more general, subsequent operations work with bools. Unfortunately runningnp.log
on the numerical columns will fail with anAttributeError
.解决方案this is a bug, update shouldn't touch unspecified columns, fixed here https://github.com/pydata/pandas/pull/3021
这篇关于大 pandas DataFrame combine_first和更新方法有奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!