在 Pandas 数据框中按行应用时如何保留数据类型? [英] How do I preserve datatype when using apply row-wise in pandas dataframe?

查看:60
本文介绍了在 Pandas 数据框中按行应用时如何保留数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个奇怪的问题,在数据帧上按行使用 apply 函数不会保留数据帧中值的数据类型.有没有办法在保留原始数据类型的数据帧上逐行应用函数?

I'm running into a weird problem where using the apply function row-wise on a dataframe doesn't preserve the datatypes of the values in the dataframe. Is there a way to apply a function row-wise on a dataframe that preserves the original datatypes?

下面的代码演示了这个问题.如果在下面的 format 函数中没有 int(...) 转换,则会出现错误,因为数据帧中的 int 在传入 func.

The code below demonstrates this problem. Without the int(...) conversion within the format function below, there would be an error because the int from the dataframe was converted to a float when passed into func.

import pandas as pd

df = pd.DataFrame({'int_col': [1, 2], 'float_col': [1.23, 4.56]})
print(df)
print(df.dtypes)

def func(int_and_float):
    int_val, float_val = int_and_float
    print('int_val type:', type(int_val))
    print('float_val type:', type(float_val))
    return 'int-{:03d}_float-{:5.3f}'.format(int(int_val), float_val)

df['string_col'] = df[['int_col', 'float_col']].apply(func, axis=1)
print(df)

这是运行上述代码的输出:

Here is the output from running the above code:

   float_col  int_col
0       1.23        1
1       4.56        2
float_col    float64
int_col        int64
dtype: object
int_val type: <class 'numpy.float64'>
float_val type: <class 'numpy.float64'>
int_val type: <class 'numpy.float64'>
float_val type: <class 'numpy.float64'>
   float_col  int_col           string_col
0       1.23        1  int-001_float-1.230
1       4.56        2  int-002_float-4.560

请注意,即使 dfint_col 列具有 dtype int64,当来自该列的值传递给函数 func 时,他们突然有dtype numpy.float64,我必须在函数的最后一行使用int(...)来转换回来,否则该行会出错.

Notice that even though the int_col column of df has dtype int64, when values from that column get passed into function func, they suddenly have dtype numpy.float64, and I have to use int(...) in the last line of the function to convert back, otherwise that line would give an error.

如有必要,我可以按照这里的方式处理此问题,但我真的很想了解为什么我会看到这种意外行为.

I can deal with this problem the way I have here if necessary, but I'd really like to understand why I'm seeing this unexpected behavior.

推荐答案

您的 int 正在向上转型成为浮点数.如果可能,Pandas(和 NumPy)将尝试将 Series(或 ndarray)转换为单一数据类型.据我所知,没有记录向上转换的确切规则,但是您可以使用 numpy.find_common_type.

Your ints are getting upcasted into floats. Pandas (and NumPy) will try to make a Series (or ndarray) into a single data type if possible. As far as I know, the exact rules for upcasting are not documented, but you can see how different types will be upcasted by using numpy.find_common_type.

您可以通过在调用 apply 之前将 DataFrame 转换为Object"类型来诱使 Pandas 和 NumPy 保持原始数据类型,如下所示:

You can trick Pandas and NumPy into keeping the original data types by casting the DataFrame as type "Object" before calling apply, like this:

df['string_col'] = df[['int_col', 'float_col']].astype('O').apply(func, axis=1)

<小时>

让我们分解一下这里发生的事情.首先,在我们执行 .astype('O') 之后 df 会发生什么?


Let's break down what is happening here. First, what happens to df after we do .astype('O')?

as_object = df[['int_col', 'float_col']].astype('O')
print(as_object.dtypes)

给出:

int_col      object
float_col    object
dtype: object

好的,现在两列都具有相同的数据类型,即对象.我们之前知道 apply()(或从 DataFrame 中提取一行的任何其他内容)将尝试将两列转换为相同的 dtype,但它会看到它们已经相同,所以无事可做.

Okay so now both columns have the same dtype, which is object. We know from before that apply() (or anything else that extracts one row from a DataFrame) will try to convert both columns to the same dtype, but it will see that they are already the same, so there is nothing to do.

然而,我们仍然能够获得原始整数和浮点数,因为 dtype('O') 表现为某种可以容纳任何 Python 对象的容器类型.通常,当 Series 包含不应混合的类型(如字符串和整数)或任何 NumPy 不理解的 Python 对象时,会使用它.

However, we are still able to get the original ints and floats because dtype('O') behaves as some sort of container type that can hold any python object. Typically it is used when a Series contains types that aren't meant to be mixed (like strings and ints) or any python object that NumPy doesn't understand.

这篇关于在 Pandas 数据框中按行应用时如何保留数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆