什么时候在python中应用(pd.to_numeric)和何时astype(np.float64)? [英] When to apply(pd.to_numeric) and when to astype(np.float64) in python?

查看:962
本文介绍了什么时候在python中应用(pd.to_numeric)和何时astype(np.float64)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为xiv的熊猫DataFrame对象,该对象有一列int64体积测量值.

I have a pandas DataFrame object named xiv which has a column of int64 Volume measurements.

In[]: xiv['Volume'].head(5)
Out[]: 

0    252000
1    484000
2     62000
3    168000
4    232000
Name: Volume, dtype: int64

我还阅读了其他帖子(例如),提出了以下解决方案.但是当我使用这两种方法时,它似乎都没有改变基础数据的dtype:

I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype of the underlying data:

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

或者...

In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('int64')

我还尝试过制作一个单独的熊猫Series,并使用该系列上面列出的方法,并将其重新分配给x['Volume']对象,这是一个pandas.core.series.Series对象.

I've also tried making a separate pandas Series and using the methods listed above on that Series and reassigning to the x['Volume'] obect, which is a pandas.core.series.Series object.

但是,我已经找到了使用numpy包的float64类型解决此问题的方法- 这可行,但我不知道为什么它与众不同

I have, however, found a solution to this problem using the numpy package's float64 type - this works but I don't know why it's different.

In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)

In[]: xiv['Volume'].dtypes
Out[]: 
dtype('float64') 

有人可以解释如何使用pandas库完成numpy库使用其float64类似乎可以轻松完成的工作;也就是说,将xiv数据帧中的列转换为适当的float64.

Can someone explain how to accomplish with the pandas library what the numpy library seems to do easily with its float64 class; that is, convert the column in the xiv DataFrame to a float64 in place.

推荐答案

如果您已经拥有数字dtype(int8|16|32|64float64boolean),则可以使用将其转换为另一个数字" dtype.熊猫 .astype()方法.

If you already have numeric dtypes (int8|16|32|64,float64,boolean) you can convert it to another "numeric" dtype using Pandas .astype() method.

演示:

In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)

In [91]: df
Out[91]:
         a        b        c
0  9059440  9590567  2076918
1  5861102  4566089  1947323
2  6636568   162770  2487991
3  6794572  5236903  5628779
4   470121  4044395  4546794

In [92]: df.dtypes
Out[92]:
a    int64
b    int64
c    int64
dtype: object

In [93]: df['a'] = df['a'].astype(float)

In [94]: df.dtypes
Out[94]:
a    float64
b      int64
c      int64
dtype: object

对于object(字符串)dtypes无效,不能转换为数字:

It won't work for object (string) dtypes, that can't be converted to numbers:

In [95]: df.loc[1, 'b'] = 'XXXXXX'

In [96]: df
Out[96]:
           a        b        c
0  9059440.0  9590567  2076918
1  5861102.0   XXXXXX  1947323
2  6636568.0   162770  2487991
3  6794572.0  5236903  5628779
4   470121.0  4044395  4546794

In [97]: df.dtypes
Out[97]:
a    float64
b     object
c      int64
dtype: object

In [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'

所以我们在这里要使用 pd.to_numeric()方法:

So here we want to use pd.to_numeric() method:

In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')

In [100]: df
Out[100]:
           a          b        c
0  9059440.0  9590567.0  2076918
1  5861102.0        NaN  1947323
2  6636568.0   162770.0  2487991
3  6794572.0  5236903.0  5628779
4   470121.0  4044395.0  4546794

In [101]: df.dtypes
Out[101]:
a    float64
b    float64
c      int64
dtype: object

这篇关于什么时候在python中应用(pd.to_numeric)和何时astype(np.float64)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆