在保持列数据类型的同时将行插入pandas DataFrame中 [英] Insert rows into pandas DataFrame while maintaining column data types

查看:135
本文介绍了在保持列数据类型的同时将行插入pandas DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在保留列数据类型的同时将新行插入到现有pandas DataFrame中的最佳方法是什么,同时为未指定的列提供用户定义的填充值?这是一个示例:

What's the best way to insert new rows into an existing pandas DataFrame while maintaining column data types and, at the same time, giving user-defined fill values for columns that aren't specified? Here's an example:

df = pd.DataFrame({
    'name': ['Bob', 'Sue', 'Tom'],
    'age': [45, 40, 10],
    'weight': [143.2, 130.2, 34.9],
    'has_children': [True, True, False]
})

假设我要添加仅通过nameage的新记录.为了维护数据类型,我可以从df复制行,修改值,然后将df附加到副本中,例如

Assume that I want to add a new record passing just name and age. To maintain data types, I can copy rows from df, modify values and then append df to the copy, e.g.

columns = ('name', 'age')
copy_df = df.loc[0:0, columns].copy()
copy_df.loc[0, columns] = 'Cindy', 42
new_df = copy_df.append(df, sort=False).reset_index(drop=True)

但这会将bool列转换为对象.

But that converts the bool column to an object.

这是一个非常棘手的解决方案,感觉不像是这样做的正确方法":

Here's a really hacky solution that doesn't feel like the "right way" to do this:

columns = ('name', 'age')
copy_df = df.loc[0:0].copy()

missing_remap = {
    'int64': 0,
    'float64': 0.0,
    'bool': False,
    'object': ''
}
for c in set(copy_df.columns).difference(columns)):
    copy_df.loc[:, c] = missing_remap[str(copy_df[c].dtype)]

new_df = copy_df.append(df, sort=False).reset_index(drop=True)
new_df.loc[0, columns] = 'Cindy', 42

我知道我一定很想念东西.

I know I must be missing something.

推荐答案

如您所见,由于NaNfloat,因此将NaN添加到系列中可能会导致其被向上转换为float或转换为object.您确定这不是理想的结果是正确的.

As you found, since NaN is a float, adding NaN to a series may cause it to be either upcasted to float or converted to object. You are right in determining this is not a desirable outcome.

没有直接的方法.我的建议是将输入的行数据存储在字典中,并在追加之前将其与默认字典合并.请注意,这是可行的,因为pd.DataFrame.append接受dict自变量.

There is no straightforward approach. My suggestion is to store your input row data in a dictionary and combine it with a dictionary of defaults before appending. Note that this works because pd.DataFrame.append accepts a dict argument.

在Python 3.6中,您可以使用语法{**d1, **d2}组合两个字典,并且优先选择第二个字典.

In Python 3.6, you can use the syntax {**d1, **d2} to combine two dictionaries with preference for the second.

default = {'name': '', 'age': 0, 'weight': 0.0, 'has_children': False}

row = {'name': 'Cindy', 'age': 42}

df = df.append({**default, **row}, ignore_index=True)

print(df)

   age  has_children   name  weight
0   45          True    Bob   143.2
1   40          True    Sue   130.2
2   10         False    Tom    34.9
3   42         False  Cindy     0.0

print(df.dtypes)

age               int64
has_children       bool
name             object
weight          float64
dtype: object

这篇关于在保持列数据类型的同时将行插入pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆