将 pandas.Series 从 dtype 对象转换为 float，并将错误转换为 nans [英] Convert pandas.Series from dtype object to float, and errors to nans

查看：190 发布时间：2021/12/3 8:42:58 python pandas nan

本文介绍了将 pandas.Series 从 dtype 对象转换为 float，并将错误转换为 nans的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑以下情况:

In [2]: a = pd.Series([1,2,3,4,'.'])

In [3]: a
Out[3]: 
0    1
1    2
2    3
3    4
4    .
dtype: object

In [8]: a.astype('float64', raise_on_error = False)
Out[8]: 
0    1
1    2
2    3
3    4
4    .
dtype: object

我希望有一个选项可以在将错误值(例如 .)转换为 NaN 的同时进行转换.有没有办法做到这一点?

I would have expected an option that allows conversion while turning erroneous values (such as that .) to NaNs. Is there a way to achieve this?

使用 `pd.to_numeric` 带有 `errors='coerce'`

# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s

0    1
1    2
2    3
3    4
4    .
dtype: object

pd.to_numeric(s, errors='coerce')

0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
dtype: float64

如果您需要填写NaN，请使用Series.fillna.

If you need the NaNs filled in, use Series.fillna.

pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')

0    1
1    2
2    3
3    4
4    0
dtype: float64

注意，downcast='infer' 会在可能的情况下尝试将浮点数向下转换为整数.如果您不想要，请删除该参数.

Note, downcast='infer' will attempt to downcast floats to integers where possible. Remove the argument if you don't want that.

从 v0.24+ 开始，pandas 引入了可空整数类型，允许整数与 NaN 共存.如果您的列中有整数，你可以使用

From v0.24+, pandas introduces a Nullable Integer type, which allows integers to coexist with NaNs. If you have integers in your column, you can use

pd.__version__
# '0.24.1'

pd.to_numeric(s, errors='coerce').astype('Int32')

0      1
1      2
2      3
3      4
4    NaN
dtype: Int32

还有其他选项可供选择，请阅读文档了解更多信息.

There are other options to choose from as well, read the docs for more.

<小时>

`DataFrames`

的扩展

如果您需要将其扩展到 DataFrames，则需要将其应用到每一行.您可以使用 DataFrame.apply.






Extension for DataFrames

If you need to extend this to DataFrames, you will need to apply it to each row. You can do this using DataFrame.apply. 
# Setup.
np.random.seed(0)
df = pd.DataFrame({
    'A' : np.random.choice(10, 5), 
    'C' : np.random.choice(10, 5), 
    'B' : ['1', '###', '...', 50, '234'], 
    'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df

   A    B  C    D
0  5    1  9   23
1  0  ###  3    1
2  3  ...  5  ...
3  3   50  2  268
4  7  234  4   $$

df.dtypes

A     int64
B    object
C     int64
D    object
dtype: object


df2 = df.apply(pd.to_numeric, errors='coerce')
df2

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

df2.dtypes

A      int64
B    float64
C      int64
D    float64
dtype: object

您也可以使用 DataFrame.transform;虽然我的测试表明这稍微慢了一点:
You can also do this with DataFrame.transform; although my tests indicate this is marginally slower:
df.transform(pd.to_numeric, errors='coerce')

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

如果您有很多列(数字；非数字)，您可以通过仅在非数字列上应用 pd.to_numeric 来提高性能.
If you have many columns (numeric; non-numeric), you can make this a little more performant by applying pd.to_numeric on the non-numeric columns only.
df.dtypes.eq(object)

A    False
B     True
C    False
D     True
dtype: bool

cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')

df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
#     df[c] = pd.to_numeric(df[c], errors='coerce')

df

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

沿列应用 pd.to_numeric(即 axis=0，默认值)对于长数据帧应该稍微快一点.
Applying pd.to_numeric along the columns (i.e., axis=0, the default) should be slightly faster for long DataFrames. 

                        这篇关于将 pandas.Series 从 dtype 对象转换为 float，并将错误转换为 nans的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

将 pandas.Series 从 dtype 对象转换为 float，并将错误转换为 nans [英] Convert pandas.Series from dtype object to float, and errors to nans

问题描述

推荐答案

使用 `pd.to_numeric` 带有 `errors='coerce'`

`DataFrames`

Extension for `DataFrames`

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将 pandas.Series 从 dtype 对象转换为 float，并将错误转换为 nans [英] Convert pandas.Series from dtype object to float, and errors to nans

问题描述

推荐答案

使用 pd.to_numeric 带有 errors='coerce'

DataFrames

Extension for DataFrames

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

使用 `pd.to_numeric` 带有 `errors='coerce'`

`DataFrames`

Extension for `DataFrames`

登录关闭