pandas.DataFrame.update中不需要的类型转换 [英] unwanted type conversion in pandas.DataFrame.update
问题描述
在更新中,pandas是否有任何理由将列的类型从int更改为float,我可以阻止它吗?这是问题的一些示例代码
Is there any reason why pandas changes the type of columns from int to float in update, and can I prevent it from doing it? Here is some example code of the problem
import pandas as pd
import numpy as np
df = pd.DataFrame({'int': [1, 2], 'float': [np.nan, np.nan]})
print('Integer column:')
print(df['int'])
for _, df_sub in df.groupby('int'):
df_sub['float'] = float(df_sub['int'])
df.update(df_sub)
print('NO integer column:')
print(df['int'])
推荐答案
原因如下:由于您有效地屏蔽了列中的某些值并将其替换(用您的更新),因此某些值可能会变为`nan
here's the reason for this: since you are effectively masking certain values on a column and replace them (with your updates), some values could become `nan
在整数数组中这是不可能的,因此将数字dtypes先验转换为float(以提高效率),因为先检查会比这样做更昂贵
in an integer array this is impossible, so numeric dtypes are apriori converted to float (for efficiency), as checking first is more expensive that doing this
可以改回dtype ...只是现在不在代码中,因此存在一个错误(虽然修复起来有些微不足道):github.com/pydata/pandas/issues/4094
a change of dtype back is possible...just not in the code right now, therefor this a bug (a bit non-trivial to fix though): github.com/pydata/pandas/issues/4094
这篇关于pandas.DataFrame.update中不需要的类型转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!