外部合并后保留 Dataframe 列数据类型 [英] Preserve Dataframe column data type after outer merge

查看:66
本文介绍了外部合并后保留 Dataframe 列数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您使用外部"合并合并某些值上的两个索引数据帧时,python/pandas 会自动将 Null (NaN) 值添加到它无法匹配的字段中.这是正常行为,但它会更改数据类型,您必须重新声明列应具有的数据类型.

When you merge two indexed dataframes on certain values using 'outer' merge, python/pandas automatically adds Null (NaN) values to the fields it could not match on. This is normal behaviour, but it changes the data type and you have to restate what data types the columns should have.

fillna()dropna() 似乎不会在合并后立即保留数据类型.我需要适当的表结构吗?

fillna() or dropna() do not seem to preserve data types immediately after the merge. Do I need a table structure in place?

通常我会运行 numpy np.where(field.isnull() etc) 但这意味着运行所有列.

Typically I would run numpy np.where(field.isnull() etc) but that means running for all columns.

是否有解决方法?

推荐答案

这应该只是 boolint dtypes 的问题.floatobjectdatetime64[ns] 已经可以保存 NaNNaT 而无需改变类型.

This should really only be an issue with bool or int dtypes. float, object and datetime64[ns] can already hold NaN or NaT without changing the type.

因此,我建议使用新的可为空的 dtype.您可以将 Int64 用于整数,将 'boolean' 用于布尔列.这两个现在都支持带有 的缺失值:pandas._libs.missing.NAType

Because of this, I'd recommend using the new nullable dtypes. You can use Int64 for your integer and 'boolean' for your Boolean columns. Both of these now support missing values with <NA>: pandas._libs.missing.NAType

import pandas as pd

df = pd.DataFrame({'a': [1]*6, 'b': [1, 2]*3, 'c': range(6)})
df2 = pd.DataFrame({'d': [1, 2], 'e': [True, False]})

df2['d'] = df2['d'].astype('Int64')
df2['e'] = df2['e'].astype('boolean')
df2.dtypes
#d      Int64
#e    boolean
#dtype: object

df.join(df2)
#   a  b  c     d      e
#0  1  1  0     1   True
#1  1  2  1     2  False
#2  1  1  2  <NA>   <NA>
#3  1  2  3  <NA>   <NA>
#4  1  1  4  <NA>   <NA>
#5  1  2  5  <NA>   <NA>

df.join(df2).dtypes
#a      int64
#b      int64
#c      int64
#d      Int64    <- dtype preserved
#e    boolean    <- dtype preserved


使用 Int64/Bool64 填充值与您指定的值保持一致,并且仅当您填充无法适应当前数据类型的值时,该列才会向上转换.


With Int64/Bool64 the fill value remains true to what you specify and the column is only upcast if you fill with a value incapable of fitting in the current dtype.

这篇关于外部合并后保留 Dataframe 列数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆