大 pandas DataFrame combine_first和更新方法有奇怪的行为 [英] pandas DataFrame combine_first and update methods have strange behavior

查看:264
本文介绍了大 pandas DataFrame combine_first和更新方法有奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到一个奇怪的问题(或意图?),其中 combine_first update 正在导致存储值如果提供的参数不提供布尔列,则 bool 将被上传到 float64



ipython中的示例工作流程:

 在[144]中:test = pd.DataFrame [[1,2,False,True],[4,5,True,False]],columns = ['a','b','isBool','isBool2'])

在[145]中:test
Out [145]:
ab isBool isBool2
0 1 2 False True
1 4 5 True False


在[147]中:b = pd.DataFrame([[45,45]],index = [0],columns = ['a','b'])

在[148] :b
Out [148]:
ab
0 45 45

在[149]中:test.update(b)

[150]:test
Out [150]:
ab isBool isBool2
0 45 45 0 1
1 4 5 1 0
/ pre>

这是否意味着成为 upd的行为ate 函数?我会认为,如果没有指定更新不会混淆其他列。






编辑:我开始修改了一点。剧情增厚。如果我再插入一个命令: test.update([])运行 test.update(b),boolean行为的成本是以对象为例。这也适用于DSM的简化示例。



根据熊猫的源代码,它看起来像reindex_like方法正在创建一个dtype 对象的DataFrame,而reindex_like b 创建一个dtype float64 的DataFrame。由于对象更为通用,随后的操作与bools一起工作。不幸的是,在数值列上运行 np.log 将失败,并带有一个 AttributeError

解决方案

这是一个错误,更新不应该触摸未指定的列,在这里修复 https://github.com/pydata/pandas/pull/3021


I'm running into a strange issue (or intended?) where combine_first or update are causing values stored as bool to be upcasted into float64s if the argument supplied is not supplying the boolean columns.

Example workflow in ipython:

In [144]: test = pd.DataFrame([[1,2,False,True],[4,5,True,False]], columns=['a','b','isBool', 'isBool2'])

In [145]: test
Out[145]:
   a  b isBool isBool2
0  1  2  False    True
1  4  5   True   False


In [147]: b = pd.DataFrame([[45,45]], index=[0], columns=['a','b'])

In [148]: b
Out[148]:
    a   b
0  45  45

In [149]: test.update(b)

In [150]: test
Out[150]:
    a   b  isBool  isBool2
0  45  45       0        1
1   4   5       1        0

Was this meant to be the behavior of the update function? I would think that if nothing was specified that update wouldn't mess with the other columns.


EDIT: I started tinkering around a little more. The plot thickens. If I insert one more command: test.update([]) before running test.update(b), boolean behavior works at the cost of numbers upcasted as objects. This also applies to DSM's simplified example.

Based on panda's source code, it looks like the reindex_like method is creating a DataFrame of dtype object, while reindex_like b creates a DataFrame of dtype float64. Since object is more general, subsequent operations work with bools. Unfortunately running np.log on the numerical columns will fail with an AttributeError.

解决方案

this is a bug, update shouldn't touch unspecified columns, fixed here https://github.com/pydata/pandas/pull/3021

这篇关于大 pandas DataFrame combine_first和更新方法有奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆